- The paper introduces a Bayesian beamspace denoising framework that classifies each beam through a single-pass hard threshold derived via a Bernoulli-complex Gaussian model.
- It employs hardware-friendly blind estimators for composite noise, channel power, and activity rate, achieving near-optimal MSE and BER with low-resolution ADCs.
- The design features an efficient VLSI architecture on FPGA, reducing computational cost and latency for real-time mmWave massive MIMO processing.
Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs
Background and Motivation
Millimeter-wave (mmWave) massive MIMO architectures are a central enabler for high-throughput, low-latency wireless communications in next-generation networks. The signal propagation characteristics at mmWave frequencies induce highly sparse spatial channels, but the hardware cost and power consumption scale unfavorably with higher ADC resolution and increasing antenna counts. To address these constraints, low-resolution ADC architectures are widely considered, but quantization artifacts significantly harm conventional channel estimation algorithms. Existing approaches—including nonlinear quantization models, AQNM-based estimators, and deep learning-based methods—either entail high computational complexity, require iterative tuning, or are unsuitable for hardware implementation under tight resource and latency budgets.
Bayesian Beamspace Denoising with Composite Quantization Noise
The proposed framework models mmWave beamspace channel denoising as a Bayesian binary hypothesis test under a Bernoulli-complex Gaussian prior. By leveraging the sparsity of the beamspace channel, each beam is classified as either noise-dominant or signal-dominant through a single-pass hard-thresholding rule, derived analytically. A distinguishing feature is the composite noise model: both thermal noise and quantization noise—typically nonlinear and signal-dependent—are jointly approximated as zero-mean complex Gaussian through AQNM and empirical validation.
The squared-magnitude of beamspace quantization noise after transformation exhibits an empirical distribution closely matching the theoretical Gaussian assumption for both real and imaginary components.

Figure 1: Distribution of normalized real and imaginary quantization noise in the beamspace domain approximated by a Gaussian.
The channel observation hˉ′ is modeled as a mixture under the hypotheses H1 (signal-plus-noise) and H0 (noise-only). Hypothesis priors correspond to the activity rate q of nonzero channel taps. The Bayesian decision rule yields a closed-form threshold on ∣hˉm′∣2 for classifying each element, parameterized by composite noise variance, the estimated SDNR, and q.
Efficient Blind Parameter Estimation
To ensure full autonomy and low-latency operation, the framework employs a suite of hardware-friendly blind estimators:
All steps incur linear computational cost in M (antennas), except for the unitary FFT/IFFT operations required for beamspace transformation, which are O(MlogM).
The proposed denoiser is directly compared with state-of-the-art model-based, optimization-based, and deep learning-based estimators across several metrics:
- Channel MSE and Symbol BER: With 3-bit ADCs, the method matches or outperforms LS, Bussgang-LMMSE, and α-BEACHES (another beamspace denoising approach), and incurs only negligible MSE/BER degradation compared to GL-QVBCE (variational Bayesian) and diffusion-model estimators, while requiring vastly lower computation. The improvements over LS grow with reduced ADC precision, highlighting the effective mitigation of quantization noise in hardware-constrained operation.

Figure 3: MSE and post-equalization BER as a function of SNR for the proposed denoiser and multiple baseline estimators with 3-bit ADCs.
- Resolution Scaling Behavior: The SNR gain achieved by the denoiser relative to the raw observation grows as ADC precision drops. At 2–3 bits, BER improvements reach 3–5 dB (at fixed BER), decreasing as ADCs approach the high-resolution regime.

Figure 4: Performance gain (MSE and BER) of the proposed denoising algorithm over baseline as a function of SNR and ADC resolution.
- Complexity: The algorithm avoids matrix inversion, iterative coordinate descent, or neural inference. Computational costs are near-linear and actual run times (Python prototype) are 2–3 orders of magnitude below learning-based or iterative baselines. With SNR prior available, the complexity reduces to H10.
VLSI Architecture and FPGA Implementation
A full VLSI processing chain is constructed, optimized for high-throughput, low-latency deployment. The architecture is modular:
Figure 6: Detailed architecture of the composite noise power estimator comprising sorting and truncated mean blocks.
Figure 7: Auxiliary estimator units for channel power, SDNR, and activity rate, with direct-use of pipelined sum, division, and LUT-based nonlinear functions.
Figure 8: Threshold computation unit with piecewise linear/logarithmic approximation blocks highlighted in gray.
Figure 9: Flow of the denoising unit: buffer, comparator, and LUT-based scaling.
The design is implemented on a Xilinx Kintex UltraScale+ FPGA. Across H11 and H12 (antennas), the architecture uses less than 6% of LUT and 4% of flip-flop resources at over 300 MHz. Latency is 2–4 H13s per vector. Full system processing (16 users) is H14s, substantially below mmWave channel coherence times.
Key hardware results:
- Sublinear scaling in latency and resource use with respect to H15.
- 35% further acceleration when SNR is known a priori, by omitting all auxiliary estimator logic.
- Lower absolute latency and resource use compared to published FPGA implementations of iterative or learning-based estimators, particularly the only method supporting low-resolution ADC scenarios.
Implications and Future Directions
On the practical side, this framework offers a viable route for integrating massive MIMO signal processing into hardware platforms constrained by tight area and power budgets. This is particularly relevant for mmWave access points, small-cell relays, or edge compute platforms. The full pipeline matches the performance of high-complexity baselines, yet is implementable as fixed-point hard-wired logic with sub-millisecond latency.
Theoretically, the successful adoption of a Bernoulli-complex Gaussian surrogate prior and composite Gaussian modeling for quantization noise—validated by empirical distributional fits—supports the utility of simple statistical surrogates in hardware-efficient signal processing, provided their limitations are well characterized and mitigated by robust estimation.
Potential further work includes joint channel/data iterative schemes that preserve linear cost per iteration, improved second-order parameter estimation under non-Gaussian hardware regimes, and algorithm-hardware co-design for rapidly reconfigurable antenna arrays.
Conclusion
This work establishes that Bayesian hard-thresholding denoising in beamspace with robust blind parameter estimation provides a uniquely favorable tradeoff: near-optimal estimation accuracy at minimal computational and hardware cost in low-resolution mmWave MIMO systems. The architecture is deployable today in both FPGA and ASIC form, supporting real-time, power-efficient operation in future massive MIMO infrastructures (2605.08855).