Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction

Published 1 Apr 2026 in cs.CV, cs.AI, cs.GR, and cs.LG | (2604.01204v1)

Abstract: Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging. We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost. Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces Neural Harmonic Textures (NHT), a method that attaches learnable feature vectors to local primitive scaffolds to boost expressive capacity.
It employs harmonic encoding with periodic activations to decompose and reconstruct high-frequency details efficiently through deferred neural decoding.
Empirical results across 3D and 2D benchmarks demonstrate significant improvements in PSNR and LPIPS, validating NHT's scalability and applicability in various domains.

Neural Harmonic Textures: Enhancing Primitive-Based Neural Reconstruction

Introduction and Context

Neural Harmonic Textures (NHT), introduced in "Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction" (2604.01204), represent a significant advance in the hybridization of primitive-based and neural field representations for real-time novel view synthesis. Primitive-based methods such as 3D Gaussian Splatting (3DGS) have proven effective at scalable and adaptive scene representations but are fundamentally limited by the expressiveness of individual primitives, creating bottlenecks in modeling high-frequency spatial and directional detail. On the other hand, neural field techniques leverage expressive learnable encodings and powerful decoders but suffer from scalability, editability, and poor locality. NHT bridges this gap, boosting the per-primitive expressive capacity without sacrificing the advantages of explicit, Lagrangian primitives.

Figure 1: NHT attaches learnable feature vectors to virtual tetrahedral scaffolds encapsulating each primitive, with decoded color produced by harmonically encoded feature accumulation followed by a deferred neural image-space mapping.

Methodology

Primitive-Bound Latent Encodings

The core innovation of NHT is the attachment of latent feature vectors to local, primitive-centric virtual scaffolds. Specifically, each geometric primitive (e.g., a 3D Gaussian or triangle) is encapsulated in its canonical (often whitened) space by a virtual tetrahedron (or triangle in 2D). Each vertex of this scaffold holds a learnable feature vector. During rendering, at ray–primitive intersections, these features are interpolated (barycentrically in canonical space) to extract a local feature embedding at the intersection point, effectively creating a Lagrangian positional encoding that adapts with geometry during motion or editing.

Figure 2: In 2D, primitives are encapsulated by ellipsoids which become spheres in canonical space, and features are attached/interpolated on tetrahedral (or triangular) scaffolds.

Harmonic Encoding and Signal Decomposition

Post-interpolation, instead of the standard direct MLP decoding, NHT performs a harmonic (Fourier-inspired) encoding of the local features. The interpolated vector is passed through stacking periodic (sine and cosine) activations, which can be interpreted as modulating the encoded frequencies of the local signal. This periodic activation mechanism acts as a local harmonic decomposition, letting the model express higher-frequency components per primitive. These harmonically encoded features are then accumulated along the camera ray, composited using alpha-weighted sums analogous to physical light transport models.

Deferred Neural Decoding

Instead of repeatedly invoking the neural decoder at every ray–primitive intersection, NHT decodes the accumulated harmonics in a single deferred pass in image space. For each pixel, the composited sum of harmonically activated features (with view direction as an additional input, spherical harmonics–encoded) is mapped to RGB color by a lightweight MLP. This design simultaneously reduces computational load (one MLP evaluation per pixel) and supports higher expressivity, reminiscent of deferred shading architectures.

Figure 3: At query time, harmonically encoded, interpolated per-primitive features are alpha-blended along the ray, and decoded via a shallow MLP in image space.

Empirical Evaluation

Radiance Field Synthesis

NHT is evaluated on standard benchmarks, including MipNeRF360 [barron2022mipnerf360], Tanks and Temples, and Deep Blending datasets. NHT consistently achieves improvements over Spherical Harmonics (SH), Spherical Voronoi, and SOTA appearance decoding schemes in both photometric and perceptual metrics (PSNR, SSIM, LPIPS) across a range of primitive counts. Notably, the accuracy improvements are most pronounced at high-frequency detail (specular effects, reflections) and in memory-constrained regimes with reduced primitive counts.

Figure 4: Qualitative comparison indicating NHT’s superior handling of high-frequency detail and view-dependent effects in complex real scenes.

As highlighted in the controlled ablation, at equal parameter and primitive budgets, NHT provides +0.3dB to +0.6dB PSNR and up to 25% lower LPIPS compared to the best prior primitive-based methods, at similar or better inference throughput (140+ FPS on high-end hardware).

Generality and Alternative Domains

NHT generalizes to other primitive-based domains. When applied as the appearance model for 2DGS and Triangle Splatting, it provides consistent quality improvements (ΔPSNR up to +0.8dB for the same topology and runtime), highlighting that the benefits of localized harmonic encoding extend beyond any specific primitive type.

Figure 5: NHT applied to alternative primitive-based methods improves reconstruction quality uniformly across representations.

Semantic Fields and High-Dimensional Signal Fitting

Owing to the deferred decoding approach, NHT is readily extended to reconstruct high-dimensional signals. For example, in joint RGB and 512d LSEG semantic field reconstruction, NHT surpasses Feature 3DGS by +1.89dB PSNR and +0.008 in cosine similarity, with 9× higher inference speed and 3× lower memory usage—demonstrating both computational efficiency and signal abstraction for downstream tasks such as semantic segmentation.

Figure 6: Projected PCA visualization comparing ground-truth LSEG features versus NHT-predicted features, showing accurate spatial and semantic alignment.

2D Image Compression

When repurposed for high-resolution 2D image fitting, NHT uses a connected Delaunay mesh and Clough-Tocher $C^1$ feature interpolation with harmonic encoding. On a curated set of 45.7MP HDR RAW images, it achieves perceptually superior reconstructions (38% lower LPIPS at 100× compression versus Instant NGP), while maintaining competitive PSNR. The architecture's parameter efficiency enables further post-training compression (e.g., quantization and entropy coding for up to 331× size reduction), preserving high image fidelity.

Figure 7: NHT achieves qualitative and quantitative improvements on compressed HDR images over hash-grid-based neural fields at equivalent or lower bitrates.

Figure 8: Visual comparison at 100× compression; NHT provides significantly lower LPIPS (perceptual error) than baseline methods at matched or better PSNR.

Implications, Limitations, and Prospective Directions

The NHT framework substantially increases the expressive bandwidth per explicit primitive, decoupling geometry from appearance and enabling high-frequency detail with fewer primitives. The signal abstraction it offers opens the integration of radiance and semantic feature field learning and supports real-time, scalable deployment scenarios. Deferred decoding facilitates variable bandwidth in inference, optimal for both desktop and mobile applications.

Some limitations remain: the raised expressive power increases overfit risk in extremely sparse supervision; inference, while real-time, is modestly slower than non-neural 3DGS when using small appearance models. Future work could address automatic level-of-detail extraction, kernel-free variants for even higher performance, and extension to neural physically based rendering or radiance caching contexts.

Conclusion

Neural Harmonic Textures (2604.01204) represent a new paradigm for primitive-based neural scene representations, hybridizing the scalability and editability of explicit primitives with the signal capacity of neural fields. By localizing feature embeddings, employing harmonic positional activation, and deferring decoding, NHT simultaneously achieves state-of-the-art reconstruction quality, signal compactness, and broad application generality. This method provides a foundation for future research into flexible, expressive, and efficient neural graphics representations, with promising applications extending into real-time semantic and high-dimensional signal rendering.

Markdown Report Issue