- The paper introduces Neural Harmonic Textures (NHT), a method that attaches learnable feature vectors to local primitive scaffolds to boost expressive capacity.
- It employs harmonic encoding with periodic activations to decompose and reconstruct high-frequency details efficiently through deferred neural decoding.
- Empirical results across 3D and 2D benchmarks demonstrate significant improvements in PSNR and LPIPS, validating NHT's scalability and applicability in various domains.
Neural Harmonic Textures: Enhancing Primitive-Based Neural Reconstruction
Introduction and Context
Neural Harmonic Textures (NHT), introduced in "Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction" (2604.01204), represent a significant advance in the hybridization of primitive-based and neural field representations for real-time novel view synthesis. Primitive-based methods such as 3D Gaussian Splatting (3DGS) have proven effective at scalable and adaptive scene representations but are fundamentally limited by the expressiveness of individual primitives, creating bottlenecks in modeling high-frequency spatial and directional detail. On the other hand, neural field techniques leverage expressive learnable encodings and powerful decoders but suffer from scalability, editability, and poor locality. NHT bridges this gap, boosting the per-primitive expressive capacity without sacrificing the advantages of explicit, Lagrangian primitives.
Figure 1: NHT attaches learnable feature vectors to virtual tetrahedral scaffolds encapsulating each primitive, with decoded color produced by harmonically encoded feature accumulation followed by a deferred neural image-space mapping.
Methodology
Primitive-Bound Latent Encodings
The core innovation of NHT is the attachment of latent feature vectors to local, primitive-centric virtual scaffolds. Specifically, each geometric primitive (e.g., a 3D Gaussian or triangle) is encapsulated in its canonical (often whitened) space by a virtual tetrahedron (or triangle in 2D). Each vertex of this scaffold holds a learnable feature vector. During rendering, at ray–primitive intersections, these features are interpolated (barycentrically in canonical space) to extract a local feature embedding at the intersection point, effectively creating a Lagrangian positional encoding that adapts with geometry during motion or editing.


Figure 2: In 2D, primitives are encapsulated by ellipsoids which become spheres in canonical space, and features are attached/interpolated on tetrahedral (or triangular) scaffolds.
Harmonic Encoding and Signal Decomposition
Post-interpolation, instead of the standard direct MLP decoding, NHT performs a harmonic (Fourier-inspired) encoding of the local features. The interpolated vector is passed through stacking periodic (sine and cosine) activations, which can be interpreted as modulating the encoded frequencies of the local signal. This periodic activation mechanism acts as a local harmonic decomposition, letting the model express higher-frequency components per primitive. These harmonically encoded features are then accumulated along the camera ray, composited using alpha-weighted sums analogous to physical light transport models.
Deferred Neural Decoding
Instead of repeatedly invoking the neural decoder at every ray–primitive intersection, NHT decodes the accumulated harmonics in a single deferred pass in image space. For each pixel, the composited sum of harmonically activated features (with view direction as an additional input, spherical harmonics–encoded) is mapped to RGB color by a lightweight MLP. This design simultaneously reduces computational load (one MLP evaluation per pixel) and supports higher expressivity, reminiscent of deferred shading architectures.
Figure 3: At query time, harmonically encoded, interpolated per-primitive features are alpha-blended along the ray, and decoded via a shallow MLP in image space.
Empirical Evaluation
Radiance Field Synthesis
NHT is evaluated on standard benchmarks, including MipNeRF360 [barron2022mipnerf360], Tanks and Temples, and Deep Blending datasets. NHT consistently achieves improvements over Spherical Harmonics (SH), Spherical Voronoi, and SOTA appearance decoding schemes in both photometric and perceptual metrics (PSNR, SSIM, LPIPS) across a range of primitive counts. Notably, the accuracy improvements are most pronounced at high-frequency detail (specular effects, reflections) and in memory-constrained regimes with reduced primitive counts.
Figure 4: Qualitative comparison indicating NHT’s superior handling of high-frequency detail and view-dependent effects in complex real scenes.
As highlighted in the controlled ablation, at equal parameter and primitive budgets, NHT provides +0.3dB to +0.6dB PSNR and up to 25% lower LPIPS compared to the best prior primitive-based methods, at similar or better inference throughput (140+ FPS on high-end hardware).
Generality and Alternative Domains
NHT generalizes to other primitive-based domains. When applied as the appearance model for 2DGS and Triangle Splatting, it provides consistent quality improvements (ΔPSNR up to +0.8dB for the same topology and runtime), highlighting that the benefits of localized harmonic encoding extend beyond any specific primitive type.
Figure 5: NHT applied to alternative primitive-based methods improves reconstruction quality uniformly across representations.
Semantic Fields and High-Dimensional Signal Fitting
Owing to the deferred decoding approach, NHT is readily extended to reconstruct high-dimensional signals. For example, in joint RGB and 512d LSEG semantic field reconstruction, NHT surpasses Feature 3DGS by +1.89dB PSNR and +0.008 in cosine similarity, with 9× higher inference speed and 3× lower memory usage—demonstrating both computational efficiency and signal abstraction for downstream tasks such as semantic segmentation.















Figure 6: Projected PCA visualization comparing ground-truth LSEG features versus NHT-predicted features, showing accurate spatial and semantic alignment.
2D Image Compression
When repurposed for high-resolution 2D image fitting, NHT uses a connected Delaunay mesh and Clough-Tocher C1 feature interpolation with harmonic encoding. On a curated set of 45.7MP HDR RAW images, it achieves perceptually superior reconstructions (38% lower LPIPS at 100× compression versus Instant NGP), while maintaining competitive PSNR. The architecture's parameter efficiency enables further post-training compression (e.g., quantization and entropy coding for up to 331× size reduction), preserving high image fidelity.
Figure 7: NHT achieves qualitative and quantitative improvements on compressed HDR images over hash-grid-based neural fields at equivalent or lower bitrates.













Figure 8: Visual comparison at 100× compression; NHT provides significantly lower LPIPS (perceptual error) than baseline methods at matched or better PSNR.
Implications, Limitations, and Prospective Directions
The NHT framework substantially increases the expressive bandwidth per explicit primitive, decoupling geometry from appearance and enabling high-frequency detail with fewer primitives. The signal abstraction it offers opens the integration of radiance and semantic feature field learning and supports real-time, scalable deployment scenarios. Deferred decoding facilitates variable bandwidth in inference, optimal for both desktop and mobile applications.
Some limitations remain: the raised expressive power increases overfit risk in extremely sparse supervision; inference, while real-time, is modestly slower than non-neural 3DGS when using small appearance models. Future work could address automatic level-of-detail extraction, kernel-free variants for even higher performance, and extension to neural physically based rendering or radiance caching contexts.
Conclusion
Neural Harmonic Textures (2604.01204) represent a new paradigm for primitive-based neural scene representations, hybridizing the scalability and editability of explicit primitives with the signal capacity of neural fields. By localizing feature embeddings, employing harmonic positional activation, and deferring decoding, NHT simultaneously achieves state-of-the-art reconstruction quality, signal compactness, and broad application generality. This method provides a foundation for future research into flexible, expressive, and efficient neural graphics representations, with promising applications extending into real-time semantic and high-dimensional signal rendering.