- The paper presents a unified transformer U-net that leverages histogram-guided self-attention and frequency-adaptive refinement to effectively tackle nighttime haze and noise.
- It integrates a DCNv4-based multi-scale backbone with an auxiliary frequency-aware branch, achieving higher PSNR and SSIM than previous state-of-the-art methods.
- Extensive experiments validate enhanced color fidelity, structural clarity, and robust dehazing performance, earning top ranking in the NTIRE 2026 challenge.
HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image Dehazing
Introduction
Nighttime image dehazing presents significant technical challenges due to the concurrent presence of haze, glow, non-uniform illumination, color distortion, and sensor noise in low-light environments. Conventional dehazing approachesโprimarily based on the atmospheric scattering model (ASM) and leveraging handcrafted priorsโinvariably underperform on nighttime scenes, where model assumptions are routinely violated by heterogeneous and locally variable degradations. While the advent of CNN- and Transformer-based architectures has considerably advanced daytime and general dehazing, their efficacy is likewise limited on real nighttime datasets due to an inability to explicitly model dynamic-range and frequency-specific corruption.
The paper "HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image Dehazing" (2604.03800) proposes a new architecture that synergistically integrates histogram-guided representation learning and frequency-adaptive refinement into a unified, transformer-enhanced U-shaped encoderโdecoder framework. HistoFusionNet combines a DCNv4-based main branch, histogram transformer blocks at the bottleneck, an auxiliary frequency-aware branch, and a lightweight frequency-adaptive refinement moduleโall tailored for the multi-faceted degradation and illumination complexity of nighttime scenes.
Architectural Contributions
HistoFusionNet is architected around two main concepts: (1) dynamic-range-aware modeling via histogram-guided self-attention, and (2) cross-frequency feature refinement to enhance structural and color fidelity.
DCNv4-based Multi-Scale Backbone:
The encoderโdecoder backbone leverages DCNv4 for main-branch feature extraction. DCNv4โs adaptive aggregationโeschewing the softmax normalization of DCNv3โachieves improved efficiency and convergence, crucial for spatially variant nighttime noise and haze patterns.
Histogram Transformer Blocks:
Inserted at the bottleneck, these blocks perform self-attention not globally or locally, but within groups of tokens possessing similar intensity characteristics, as defined by their dynamic-range histograms. By sorting and partitioning latent features into B bins according to per-token intensity and attending within bins, HistoFusionNet directly models long-range correlations among similarly degraded regions, enhancing restoration in the presence of non-uniform illumination and glow.
Auxiliary Frequency-Aware Branch:
This complementary branch, inspired by spatial-frequency interaction networks, preserves contextual guidance by exploiting statistics in the frequency domain via skip connections, attenuating failures in both high and low-frequency restoration.
Frequency-Adaptive Refinement Module:
In a second optimization stage, the lightweight refinement module decomposes features extracted from the coarse dehazed output into low- and high-frequency components using Fourier transforms. Adaptive mixing at multiple decoder scales and guided fusion (with learnable per-scale coefficients) dynamically balances encoder and decoder representations, allowing for targeted residual enhancementโcrucial for recovery of fine structural textures and faithful color restoration that are often lost in primary dehazing.
Figure 1: Visual comparisons on NH-HAZE dataset. Compared to other models, HistoFusionNet exhibits higher color fidelity and effective dehazing, yielding compelling results.
Experimental Results
Datasets and Baselines:
Evaluations are conducted on four real-world datasets: NTIRE 2026 Nighttime Image Dehazing Challenge, NH-HAZE, NH-HAZE2, and Dense-Haze. SOTA comparison methods include SFSNiD, SFMN, DWT-FFC, and DehazeDCT.
Numerical Performance:
HistoFusionNet delivers dominant performance across all datasets, achieving the highest PSNR and SSIM values. For instance, it yields 27.879 dB PSNR and 0.905 SSIM on the NTIRE 2026 dataset, outperforming DehazeDCT by 0.414 dB PSNR and 0.006 SSIM. On NH-HAZE, NH-HAZE2, and Dense-Haze, similar consistent improvements are observed.
Visual Quality:
Figures demonstrate HistoFusionNet's superiority in color preservation, haze removal efficacy, and structural detail restoration, particularly on NH-HAZE2 and Dense-Haze datasets.
Figure 2: Visual experiment results on NH-HAZE2 dataset. The model demonstrates superior performance regarding color preservation and detail maintenance.
Figure 3: Qualitative comparison on the Dense-Haze dataset. The method yields clearer structures, improved color fidelity, and superior detail recovery.
Challenge Results:
HistoFusionNet ranked 1st in the NTIRE 2026 Nighttime Image Dehazing Challenge among 22 teams, achieving the best scores in PSNR (27.88 dB), SSIM (0.91), LPIPS (0.10), and competitive scores on perceptual quality metrics (MUSIQ, NIQE, FID).
Figure 4: Results on the validation set of the NTIRE 2026 Challenge, demonstrating improved visibility, structural clarity, and color restoration.
Ablation Studies
Ablation on the Dense-Haze dataset quantifies the additive value of each module:
- Histogram transformer removal degrades PSNR/SSIM, evidencing the necessity of dynamic-range-aware attention for global degradation correlation.
- Exclusion of the frequency-adaptive refinement or auxiliary frequency-aware branch also decreases performance, confirming the efficacy of explicit frequency modulation for detail and color enhancement.
- Each architectural component demonstrably contributes to final dehazing performance.
Theoretical and Practical Implications
The explicit coupling of histogram-guided grouping and frequency-adaptive mechanisms sheds light on the inadequacies of traditional global attention and purely spatial CNNs for restoration under night conditions. Modeling based on dynamic-range statistics and frequency separability enables network architectures to more robustly accommodate multi-modal nighttime corruptionโspanning haze, noise, glow, and non-uniform illumination.
For downstream applications in autonomous driving, surveillance, and transportation, enhanced visibility with high-fidelity color and detail under variable nighttime conditions directly translates into safer, more reliable perception systems.
On the theoretical front, HistoFusionNet suggests a broader direction for image restoration in adverse conditions: context-sensitive attention (dynamically defined by image statistics rather than fixed grids) and frequency-adaptive post-refinement are both crucial and complementary for robust restoration.
Future Directions
- Integration with task-adaptive perception systems, such as object detection under adverse nighttime conditions.
- Exploration of histogram-guided or frequency-adaptive modules in other restoration tasks (e.g., low-light enhancement, deraining, or joint dehazing/denoising approaches).
- Further optimization of module efficiency for real-time, edge deployment, potentially via quantized or hardware-friendly variants of histogram transformer and frequency-adaptive blocks.
- Investigation into unsupervised or unpaired learning protocols aligned with the statistical priors encoded in HistoFusionNet, enhancing real-world generalization.
Conclusion
HistoFusionNet establishes a robust paradigm for nighttime image dehazing by unifying DCNv4-based multi-scale encoding, histogram-based grouping for dynamic-range awareness, and explicit cross-frequency refinement. Its empirical success across multiple datasets and leading performance in competitive benchmarks underscores the value of both dynamic-range-driven attention mechanisms and frequency-selective enhancement for real-world, illumination-variant image restoration.
(2604.03800)