Improving Local Feature Matching by Entropy-inspired Scale Adaptability and Flow-endowed Local Consistency

Published 8 Apr 2026 in cs.CV | (2604.06713v1)

Abstract: Recent semi-dense image matching methods have achieved remarkable success, but two long-standing issues still impair their performance. At the coarse stage, the over-exclusion issue of their mutual nearest neighbor (MNN) matching layer makes them struggle to handle cases with scale difference between images. To this end, we comprehensively revisit the matching mechanism and make a key observation that the hint concealed in the score matrix can be exploited to indicate the scale ratio. Based on this, we propose a scale-aware matching module which is exceptionally effective but introduces negligible overhead. At the fine stage, we point out that existing methods neglect the local consistency of final matches, which undermines their robustness. To this end, rather than independently predicting the correspondence for each source pixel, we reformulate the fine stage as a cascaded flow refinement problem and introduce a novel gradient loss to encourage local consistency of the flow field. Extensive experiments demonstrate that our novel matching pipeline, with these proposed modifications, achieves robust and accurate matching performance on downstream tasks.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents entropy-driven scale adaptability to dynamically adjust matching windows, enhancing recall under scale variance.
It introduces cascaded flow refinement with gradient loss, boosting spatial coherence and achieving accurate correspondence estimation.
Experimental results demonstrate improvements in relative pose, homography, and visual localization tasks compared to state-of-the-art methods.

Entropy-driven Scale Adaptability and Cascaded Flow Consistency in Semi-dense Local Feature Matching

Introduction

This work addresses two fundamental issues in semi-dense image matching: the over-exclusion caused by the mutual nearest neighbor (MNN) matching under scale disparity, and local flow inconsistency in the fine stage. The pipeline builds on the LoFTR paradigm but introduces entropy-aware analysis and cascaded flow refinement with a gradient loss, enabling robust matching against scale variance and improving downstream geometric tasks.

Figure 1: Overview of the method; multi-level features are extracted by Siamese CNN, entropy heuristics drive scale-aware matching, and local consistency is enforced via cascaded convolutions.

Scale-aware Coarse Matching by Score Entropy

The over-exclusion problem is inherent in dual-softmax MNN, particularly in many-to-one scenarios induced by scale variations. The key innovation is leveraging the entropy of score distributions between patch pairs to distinguish co-visible regions from unmatched areas:

Entropy as Co-visibility Indicator: Concentrated (one-hot) score distributions correlate to co-visible patches, while diffused distributions reflect unmatchable points.
Scale Ratio Estimation: By thresholding entropy, the method partitions each image into co-visible sets and estimates the dominant scale ratio.
Adaptive MNN (AMNN): The scale ratio determines the inspection window for cycle consistency checks, relaxing the strict one-to-one constraint while preserving rejection capability.
Figure 2: Score heatmaps reveal entropy patterns; co-visible regions show sharp, one-hot responses, while unmatchable points produce diffuse, high-entropy distributions.

Figure 3: AMNN inspection windows allow many-to-one matching, increasing recall under scale variance.

This entropy-inspired matching eliminates manual scale segmentation or patch subdivision, maintaining efficiency and differentiability.

Standard refinement optimizes pixel correspondences independently, leading to spatially incoherent flow predictions. The proposed approach:

Cascaded Flow Refinement: Convolution modules propagate local context, updating coarse flows through recursive residual predictions across feature hierarchies.
Flow Gradient Loss: Augments the standard $l_2$ regression with explicit gradient supervision, enforcing local continuity and smoothness. This penalizes contradictory flow directions between adjacent pixels, mitigating distortions.
Figure 4: Visualization of local consistency: independent refinement can produce conflicting flows, while gradient supervision maintains spatial coherence.

Certainty maps are likewise upsampled and used to filter reliable dense correspondences.

Experimental Results and Numerical Analysis

The pipeline is validated across relative pose estimation, homography estimation, and visual localization benchmarks.

Relative Pose Estimation: On MegaDepth ( $\mathrm{AUC}@5^\circ$ ), the method achieves 59.6 vs. 56.4 for EfficientLoFTR and outperforms competing semi-dense and sparse methods. On ScanNet, the pipeline maintains competitive accuracy and is notably efficient, with 56.1 ms per image pair at $640\times480$ .
Homography Estimation: HPatches results indicate superior AUC (70.5 @3px) over previous state-of-the-art semi-dense and sparse approaches.
Visual Localization: On InLoc and Aachen v1.1, the method achieves comparable recall rates to dense matchers but at significantly lower computational cost.
Figure 5: Indoor qualitative results show increased match quantity and accuracy with scale-aware matching.

Figure 6: Outdoor results demonstrate that AMNN matching recovers additional valid correspondences under scale disparity.

Figure 7: Pose estimation AUC@5 $^\circ$ remains stable over decreasing image resolution, outperforming EfficientLoFTR and JamMa.

Figure 8: In repetitive regions, entropy-based selection distinguishes co-visible from unmatchable points; responses in the latter are less localized.

Ablation studies confirm the necessity of each proposed element: AMNN matching, cascaded refinement, and gradient loss each contribute essential improvements, as reflected in pose estimation metrics and qualitative flow consistency.

Practical and Theoretical Implications

Efficiency without Compromise: The method achieves dense correspondences in a semi-dense framework, balancing computational efficiency and robustness, which is crucial for real-time visual odometry, SfM, and SLAM.
Adaptability to Scale and Resolution: Entropy-driven matching is inherently resilient to both scale ratio variations and resolution changes, reducing the needs for manual intervention or domain-specific tuning.
Consistency for Downstream Tasks: Flow gradient supervision yields correspondences with more reliable spatial structure, improving performance in tasks sensitive to geometric consistency, such as camera pose recovery and homography estimation.
Potential for Extension: The global scale approximation via entropy is robust under typical conditions, but less suited to strongly spatially non-uniform scale variations or extreme distortion. Dense matchers remain superior in these rare cases but at much higher computational cost.

Limitations and Future Directions

The global nature of scale estimation via entropy does not explicitly address highly non-uniform local scale changes, such as those induced by depth discontinuities or severe perspective distortion.
Dense matchers (e.g., DKM, ROMA) are more robust under extreme viewpoint changes but incur significantly more computational load.
Scaling the pipeline to massive 3D data or adapting for foundation models remains open for future research.

Conclusion

The paper advances semi-dense matching pipelines by incorporating entropy-inspired scale adaptation and flow-endowed local consistency, substantially improving recall, precision, and spatial coherence in local feature matching. The proposed method achieves robust performance across key computer vision benchmarks and offers a versatile solution for geometric tasks requiring efficient yet precise image correspondence estimation (2604.06713).

Markdown Report Issue