- The paper presents a novel hierarchical implicit neural field approach leveraging octree-based multi-scale features for LiDAR odometry and mapping without ground-truth supervision.
- It introduces a self-supervised scan-to-implicit registration method using Levenberg-Marquardt optimization to minimize SDF residuals for precise pose estimation.
- Empirical results demonstrate state-of-the-art performance on datasets like KITTI with robust mesh reconstruction and real-time computational efficiency.
Hierarchical Implicit Neural Fields for LiDAR Odometry and Mapping: An Expert Review of Hi-LOAM#1
Introduction and Context
The Hi-LOAM#1 framework proposes a hierarchical implicit neural field approach for LiDAR Odometry and Mapping (LOAM) in large-scale and complex 3D environments. The method addresses critical limitations of prior explicit and learning-based LOAM pipelines, most notably the trade-off between map fidelity, data efficiency, the need for pose supervision, and generalizability across varied datasets. By leveraging multiscale neural features organized within an octree-structured hash table and coupling these with self-supervised, correspondence-free scan-to-implicit matching, Hi-LOAM#1 establishes a robust alternative to both traditional ICP-based and recent implicit neural odometry pipelines.
Figure 1: System overview—LiDAR samples are used for ray-based neural SDF construction, with odometry and mapping modules operating over hierarchical features in an octree.
Methodological Contributions
Hierarchical Multi-scale Neural Feature Embedding
At the core of Hi-LOAM#1 is a hierarchical feature encoding mechanism. LiDAR point clouds are processed by sampling along rays, with point samples distributed in both free space and near-object surfaces. Sampled points are transformed into the world frame and used to construct latent features at multiple octree levels. A Morton code-based hash function enables efficient mapping from 3D spatial locations to unique feature codes, with features from the finest three octree layers concatenated for multiscale representation. These concatenated features feed a shallow MLP decoder yielding signed distance field (SDF) values for each queried point, thus encoding geometric occupancy and facilitating continuous surface estimation.
Figure 2: Hierarchical feature lookup with Morton-coded octree—node/corner correspondence ensures fast feature access and spatial adaptivity.
Self-supervised Scan-to-Implicit Map Registration
Contrary to standard ICP-based pose estimation, Hi-LOAM#1 employs a scan-to-implicit-map registration. The pose is optimized by minimizing the SDF residual (L2 loss) across all scan points using a Levenberg-Marquardt (LM) solver. The hierarchical feature structure allows for sharply localized or more context-aware registration depending on environmental structure, which benefits both generalization and resilience to dynamic objects. Feature and MLP weights are frozen during pose optimization, thereby sidestepping issues associated with overfitting and catastrophic forgetting during registration.
Map Integration and Mesh Extraction
Submaps are built and registered using local features, which are subsequently merged into a global feature map via efficient table update algorithms that exploit the underlying hash-table and octree structure. To visualize and evaluate mapping output, Marching Cubes is applied on the inferred SDF to extract watertight triangular meshes, resulting in high-fidelity 3D reconstructions.
Figure 3: Example of reconstructed map and trajectory on KITTI sequence 07—demonstrating both mapping quality and pose accuracy.
Figure 4: Qualitative mesh reconstruction results on multiple KITTI sequences, highlighting scalability and mesh accuracy in large environments.
Empirical Evaluation and Numerical Results
Hi-LOAM#1 systematically surpasses or matches state-of-the-art methods in localization benchmarks. On KITTI odometry, Hi-LOAM#1 offers the best or second-best solutions when considering average relative translation error and absolute trajectory error across all sequences, being on par with KISS-ICP, Mesh-LOAM, and PIN-SLAM for localization accuracy. Importantly, these results are obtained in a self-supervised regime without any pre-training or GT pose supervision, as evidenced by strong consistency in both RMSE and drift metrics.
Figure 5: KITTI Sequence 00 odometry—Hi-LOAM#1 trajectories closely follow ground-truth, outperforming both explicit and other neural methods.
Robustness in Challenging Environments
On the highly dynamic and long-range MulRAN, Hilti, and SemanticPOSS datasets, Hi-LOAM#1 maintains competitive performance despite a lack of loop closure routines, outperforming PIN-SLAM and significantly outperforming NeRF-LOAM, which suffers from poorer generalization in dynamic or repetitive scenes. In handheld, vibration-dominated datasets such as Newer College and Hilti-23 (with featureless or highly repetitive structures), the architecture’s hierarchical embedding yields more robust pose estimates where other learning-based methods fail to initialize or diverge.








Figure 6: MulRAN dataset results depicting close adherence of estimated trajectory to ground truth in complex urban settings.
Figure 7: Hilti-23 (construction site)—Hi-LOAM#1 maintains mesh integrity and pose accuracy despite severe sensor and environment ambiguity.
3D Reconstruction Accuracy
Hi-LOAM#1 demonstrates state-of-the-art mesh reconstruction. On both synthetic (Mai City) and real (Newer College), it achieves lower mean error metrics (e.g., completion, accuracy, Chamfer-L1) and higher F-scores under 10-20 cm tolerances when compared with SHINE, VDB-Fusion, SLAMesh, NeRF-LOAM, and PIN-SLAM.





Figure 8: Mai City mesh and error analysis—Hi-LOAM#1 produces minimal red (error) region, outperforming other implicit and explicit methods.






Figure 9: Newer College mesh and error analysis—error maps evidence the reduction of large deviation zones by Hi-LOAM#1.
Figure 10: Detailed mesh visualization of complex indoor/outdoor topologies from hand-held LiDAR, substantiating high reconstruction fidelity.
Architectural and Training Analysis
Ablation studies demonstrate that hierarchical feature composition (three-level concatenation) offers consistent accuracy improvements over single-level feature embeddings, with diminishing returns above three levels due to memory/computation trade-offs. Sensitivity experiments on leaf node size, octree depth, and submap size provide concrete guidelines for tuning according to environment scale—leaf node sizes of 0.2–0.25 m and octree depths of 13–15 for large-scale outdoor scenes. Notably, using LM for pose optimization provides superior convergence compared to Adam, which is sensitive to learning rates and demonstrates mode collapse in challenging sequences.
Figure 11: Adam vs. LM optimizer—LM maintains trajectory integrity, whereas Adam frequently experiences drift and divergence on KITTI 06.
Resource Efficiency
Hi-LOAM#1 achieves high performance with moderate computational and storage overhead. While memory consumption is higher than PIN-SLAM due to the multi-level hierarchy, it is well below traditional explicit surfel or point cloud maps. Pose estimation per frame remains real-time capable (∼1.7 Hz for odometry), with most compute dedicated to map optimization rather than runtime localization.
Implications and Forward Outlook
The results substantiate the advantages of scan-to-implicit neural matching for LiDAR odometry and reconstruction, particularly in self-supervised regimes and harsh real-world conditions lacking pose annotations. The ability to adaptively allocate geometric detail via hierarchical features makes the method applicable to both sparse outdoor and dense indoor contexts.
Practically, this architecture offers a scalable solution for embodied AI agents—autonomous vehicles, robotics, and mixed-reality systems—where robustness to sensor, scene, and supervision variation is essential. The octree-based hash structure facilitates substantial advances in memory-accuracy trade-off, and scan-to-implicit registration is well-positioned for future advances, such as integration of multi-modal cues (camera, IMU), adaptive loop closure, semantic segmentation, and deployment across heterogeneous sensor types.
Theoretically, Hi-LOAM#1 highlights the merit of hybrid explicit-implicit representations for solving large-scale SLAM without labeled ground truth, and motivates further research into hierarchical neural field generalization, efficient optimizers, and compact map compression.
Conclusion
Hi-LOAM#1 presents a robust, memory-efficient, and highly accurate self-supervised LOAM methodology based on hierarchical implicit neural fields. The octree-based multi-scale feature embedding and pose optimization via scan-to-implicit-map loss yields state-of-the-art results across odometry and mapping metrics over diverse and challenging datasets, with substantial improvements over prior implicit and mesh-based approaches. The framework demonstrates strong generalizability and offers a viable foundation for subsequent research into scalable, real-world embodied AI navigation with minimal domain adaptation.