HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image Dehazing

Published 4 Apr 2026 in cs.CV | (2604.03800v1)

Abstract: Nighttime image dehazing remains a challenging low-level vision problem due to the joint presence of haze, glow, non-uniform illumination, color distortion, and sensor noise, which often invalidate assumptions commonly used in daytime dehazing. To address these challenges, we propose HistoFusionNet, a transformer-enhanced architecture tailored for nighttime image dehazing by combining histogram-guided representation learning with frequency-adaptive feature refinement. Built upon a multi-scale encoder-decoder backbone, our method introduces histogram transformer blocks that model long-range dependencies by grouping features according to their dynamic-range characteristics, enabling more effective aggregation of similarly degraded regions under complex nighttime lighting. To further improve restoration fidelity, we incorporate a frequency-aware refinement branch that adaptively exploits complementary low- and high-frequency cues, helping recover scene structures, suppress artifacts, and enhance local details. This design yields a unified framework that is particularly well suited to the heterogeneous degradations encountered in real nighttime hazy scenes. Extensive experiments and highly competitive performance of our method on the NTIRE 2026 Nighttime Image Dehazing Challenge benchmark demonstrate the effectiveness of the proposed method. Our team ranked 1st among 22 participating teams, highlighting the robustness and competitive performance of HistoFusionNet. The code is available at: https://github.com/heydarimo/Night-Time-Dehazing

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper presents a unified transformer U-net that leverages histogram-guided self-attention and frequency-adaptive refinement to effectively tackle nighttime haze and noise.
It integrates a DCNv4-based multi-scale backbone with an auxiliary frequency-aware branch, achieving higher PSNR and SSIM than previous state-of-the-art methods.
Extensive experiments validate enhanced color fidelity, structural clarity, and robust dehazing performance, earning top ranking in the NTIRE 2026 challenge.

Introduction

Nighttime image dehazing presents significant technical challenges due to the concurrent presence of haze, glow, non-uniform illumination, color distortion, and sensor noise in low-light environments. Conventional dehazing approaches—primarily based on the atmospheric scattering model (ASM) and leveraging handcrafted priors—invariably underperform on nighttime scenes, where model assumptions are routinely violated by heterogeneous and locally variable degradations. While the advent of CNN- and Transformer-based architectures has considerably advanced daytime and general dehazing, their efficacy is likewise limited on real nighttime datasets due to an inability to explicitly model dynamic-range and frequency-specific corruption.

The paper "HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image Dehazing" (2604.03800) proposes a new architecture that synergistically integrates histogram-guided representation learning and frequency-adaptive refinement into a unified, transformer-enhanced U-shaped encoder–decoder framework. HistoFusionNet combines a DCNv4-based main branch, histogram transformer blocks at the bottleneck, an auxiliary frequency-aware branch, and a lightweight frequency-adaptive refinement module—all tailored for the multi-faceted degradation and illumination complexity of nighttime scenes.

Architectural Contributions

HistoFusionNet is architected around two main concepts: (1) dynamic-range-aware modeling via histogram-guided self-attention, and (2) cross-frequency feature refinement to enhance structural and color fidelity.

DCNv4-based Multi-Scale Backbone:

The encoder–decoder backbone leverages DCNv4 for main-branch feature extraction. DCNv4’s adaptive aggregation—eschewing the softmax normalization of DCNv3—achieves improved efficiency and convergence, crucial for spatially variant nighttime noise and haze patterns.

Histogram Transformer Blocks:

Inserted at the bottleneck, these blocks perform self-attention not globally or locally, but within groups of tokens possessing similar intensity characteristics, as defined by their dynamic-range histograms. By sorting and partitioning latent features into $B$ bins according to per-token intensity and attending within bins, HistoFusionNet directly models long-range correlations among similarly degraded regions, enhancing restoration in the presence of non-uniform illumination and glow.

Auxiliary Frequency-Aware Branch:

This complementary branch, inspired by spatial-frequency interaction networks, preserves contextual guidance by exploiting statistics in the frequency domain via skip connections, attenuating failures in both high and low-frequency restoration.

Frequency-Adaptive Refinement Module:

In a second optimization stage, the lightweight refinement module decomposes features extracted from the coarse dehazed output into low- and high-frequency components using Fourier transforms. Adaptive mixing at multiple decoder scales and guided fusion (with learnable per-scale coefficients) dynamically balances encoder and decoder representations, allowing for targeted residual enhancement—crucial for recovery of fine structural textures and faithful color restoration that are often lost in primary dehazing.

Figure 1: Visual comparisons on NH-HAZE dataset. Compared to other models, HistoFusionNet exhibits higher color fidelity and effective dehazing, yielding compelling results.

Experimental Results

Datasets and Baselines:

Evaluations are conducted on four real-world datasets: NTIRE 2026 Nighttime Image Dehazing Challenge, NH-HAZE, NH-HAZE2, and Dense-Haze. SOTA comparison methods include SFSNiD, SFMN, DWT-FFC, and DehazeDCT.

Numerical Performance:

HistoFusionNet delivers dominant performance across all datasets, achieving the highest PSNR and SSIM values. For instance, it yields 27.879 dB PSNR and 0.905 SSIM on the NTIRE 2026 dataset, outperforming DehazeDCT by 0.414 dB PSNR and 0.006 SSIM. On NH-HAZE, NH-HAZE2, and Dense-Haze, similar consistent improvements are observed.

Visual Quality:

Figures demonstrate HistoFusionNet's superiority in color preservation, haze removal efficacy, and structural detail restoration, particularly on NH-HAZE2 and Dense-Haze datasets.

Figure 2: Visual experiment results on NH-HAZE2 dataset. The model demonstrates superior performance regarding color preservation and detail maintenance.

Figure 3: Qualitative comparison on the Dense-Haze dataset. The method yields clearer structures, improved color fidelity, and superior detail recovery.

Challenge Results:

HistoFusionNet ranked 1st in the NTIRE 2026 Nighttime Image Dehazing Challenge among 22 teams, achieving the best scores in PSNR (27.88 dB), SSIM (0.91), LPIPS (0.10), and competitive scores on perceptual quality metrics (MUSIQ, NIQE, FID).

Figure 4: Results on the validation set of the NTIRE 2026 Challenge, demonstrating improved visibility, structural clarity, and color restoration.

Ablation Studies

Ablation on the Dense-Haze dataset quantifies the additive value of each module:

Histogram transformer removal degrades PSNR/SSIM, evidencing the necessity of dynamic-range-aware attention for global degradation correlation.
Exclusion of the frequency-adaptive refinement or auxiliary frequency-aware branch also decreases performance, confirming the efficacy of explicit frequency modulation for detail and color enhancement.
Each architectural component demonstrably contributes to final dehazing performance.

Theoretical and Practical Implications

The explicit coupling of histogram-guided grouping and frequency-adaptive mechanisms sheds light on the inadequacies of traditional global attention and purely spatial CNNs for restoration under night conditions. Modeling based on dynamic-range statistics and frequency separability enables network architectures to more robustly accommodate multi-modal nighttime corruption—spanning haze, noise, glow, and non-uniform illumination.

For downstream applications in autonomous driving, surveillance, and transportation, enhanced visibility with high-fidelity color and detail under variable nighttime conditions directly translates into safer, more reliable perception systems.

On the theoretical front, HistoFusionNet suggests a broader direction for image restoration in adverse conditions: context-sensitive attention (dynamically defined by image statistics rather than fixed grids) and frequency-adaptive post-refinement are both crucial and complementary for robust restoration.

Future Directions

Integration with task-adaptive perception systems, such as object detection under adverse nighttime conditions.
Exploration of histogram-guided or frequency-adaptive modules in other restoration tasks (e.g., low-light enhancement, deraining, or joint dehazing/denoising approaches).
Further optimization of module efficiency for real-time, edge deployment, potentially via quantized or hardware-friendly variants of histogram transformer and frequency-adaptive blocks.
Investigation into unsupervised or unpaired learning protocols aligned with the statistical priors encoded in HistoFusionNet, enhancing real-world generalization.

Conclusion

HistoFusionNet establishes a robust paradigm for nighttime image dehazing by unifying DCNv4-based multi-scale encoding, histogram-based grouping for dynamic-range awareness, and explicit cross-frequency refinement. Its empirical success across multiple datasets and leading performance in competitive benchmarks underscores the value of both dynamic-range-driven attention mechanisms and frequency-selective enhancement for real-world, illumination-variant image restoration.

(2604.03800)

Markdown Report Issue