Online Invariance Selection for Local Feature Descriptors

Published 17 Jul 2020 in cs.CV | (2007.08988v3)

Abstract: To be invariant, or not to be invariant: that is the question formulated in this work about local descriptors. A limitation of current feature descriptors is the trade-off between generalization and discriminative power: more invariance means less informative descriptors. We propose to overcome this limitation with a disentanglement of invariance in local descriptors and with an online selection of the most appropriate invariance given the context. Our framework consists in a joint learning of multiple local descriptors with different levels of invariance and of meta descriptors encoding the regional variations of an image. The similarity of these meta descriptors across images is used to select the right invariance when matching the local descriptors. Our approach, named Local Invariance Selection at Runtime for Descriptors (LISRD), enables descriptors to adapt to adverse changes in images, while remaining discriminative when invariance is not required. We demonstrate that our method can boost the performance of current descriptors and outperforms state-of-the-art descriptors in several matching tasks, when evaluated on challenging datasets with day-night illumination as well as viewpoint changes.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (70)

View on Semantic Scholar

Summary

Online Invariance Selection for Local Feature Descriptors

In the paper "Online Invariance Selection for Local Feature Descriptors," a new framework is proposed to address the limitations of current feature descriptors by efficiently balancing discriminative power and invariance. Traditional approaches to feature descripting often require a trade-off between these two aspects, where increased invariance results in less informative, and thus reduced discriminative descriptors. This research introduces the Local Invariance Selection at Runtime for Descriptors (LISRD) as an adaptation mechanism to refine the descriptors based on contextual information derived from images.

Core Concepts and Methodology

The key innovation in this paper is the disentanglement of invariance levels within local descriptors, which allows selecting the appropriate level of invariance dynamically based on image context. The main components of LISRD include:

Disentangled Invariance in Descriptors: The approach is built on the premise that feature descriptors can have varying levels of invariance concerning specific image transformations, such as illumination and rotation changes. Descriptors are trained through a network architecture that computes several descriptors with different invariance properties, thus catering to different scenario needs.
Meta Descriptors: These meta descriptors leverage global image context to inform the decision-making process for the best invariance to apply. The meta descriptors are themselves an overview of local descriptors aggregated through a NetVLAD layer, which encodes regional variations in images.
Online Selection: At runtime, when matching local descriptors between images, the framework uses meta descriptors to guide the selection of the most suitable invariance level. This is achieved through a computed distance metric that weights local descriptor distances by their regional meta descriptor similarities.

The LISRD framework is versatile and can integrate both learned and handcrafted descriptors, as demonstrated through experiments applying the methodology to descriptors like SIFT and Upright SIFT.

Experimental Evaluation

In evaluating this approach, a series of experimental setups further reveal the efficacy of LISRD in improving descriptor performance:

Benchmarking Against SOTA: The LISRD framework was evaluated against state-of-the-art descriptors on datasets such as HPatches, which includes homographic transformations of images under varied illumination and viewpoint scenarios. The proposed framework consistently showed improved precision and recall, validating the effectiveness of dynamic invariance selection over static descriptors.
Challenging Conditions: An additional evaluation was done using a new day-night image matching dataset that involved wide baseline, illumination changes, and synthetic rotations. Here, LISRD demonstrated superior matching accuracy compared to established local descriptors, highlighting its ability to adapt to challenging conditions.
Application to Visual Localization: Finally, LISRD was applied to a practical visual localization task using the Aachen Day-Night dataset. The results illustrated its robustness across different keypoint types, improving the localization performance relative to other descriptor methods.

Potential Implications and Future Developments

The implications of LISRD extend to numerous computer vision applications like VR, AR localization, and autonomous navigation systems, where adaptability to varying conditions is paramount. By allowing descriptors to dynamically adapt their invariance properties, LISRD facilitates improved repeatability and robustness across diverse imaging conditions.

Future research could explore extending this approach to encompass additional types of invariance or scale its application across broader contexts. Additionally, further optimizing the disentanglement process to minimize redundancy across descriptors could enhance LISRD’s efficiency and expand its applicability.

In summary, this research contributes significantly to modern computer vision methodologies, enhancing both theoretical understanding and practical features of local feature descriptors by intelligently managing invariance at runtime.

Markdown Report Issue