AAAI 2025 Conference Paper

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

FoL is a two-stage Visual Place Recognition (VPR) approach that improves image retrieval and re-ranking by focusing on reliable, discriminative local regions.

Changwei Wang, Shunpeng Chen, Yukun Song, Rongtao Xu, Zherui Zhang,
Jiguang Zhang, Haoran Yang, Yu Zhang, Kexue Fu, Shide Du, Zhiwei Xu, Longxiang Gao, Li Guo, Shibiao Xu

FoL method pipeline showing retrieval, discriminative local region discovery, and re-ranking.
Overview of FoL. The method discovers reliable discriminative local regions and uses them for global retrieval and local re-ranking.

Overview

FoL studies how to identify reliable discriminative regions for VPR, where not every local patch is equally useful under viewpoint, illumination, and seasonal changes. Instead of relying only on whole-image similarity, FoL first retrieves candidate places globally and then selects stable, informative local regions to support fine-grained re-ranking, making the final match decision more robust in visually ambiguous scenes.

Highlights

Reliable local evidence

FoL focuses matching on discriminative regions instead of treating every patch as equally useful.

Two-stage VPR pipeline

Global retrieval supplies efficient candidates, while local re-ranking improves precision for hard cases.

Strong benchmark coverage

Experiments report Recall@K on common VPR datasets including Pitts, MSLS, Tokyo24/7, Nordland, SF-XL, SVOX, and Eynsham.

Open implementation

The repository includes training, evaluation, pretrained weights, Torch Hub loading, and visualization scripts.

Method Overview

1. Global retrieval

FoL uses a DINOv2-based visual backbone and aggregation module to produce compact image descriptors for first-stage retrieval.

2. Discriminative region guidance

The model searches for stable, informative local regions that are more likely to support true place correspondence.

3. Local re-ranking

Candidate matches are refined with region-aware local evidence, improving robustness in visually ambiguous or condition-shifted scenes.

Results

FoL is evaluated against representative VPR methods on standard benchmarks. The results emphasize the benefit of local re-ranking, especially on challenging condition-shifted datasets.

FoL benchmark comparison on Pitts250k-test, MSLS-val, MSLS-challenge, and Tokyo24/7.
Recall@K comparison on Pitts250k-test, MSLS-val, MSLS-challenge, and Tokyo24/7. FoL reports both global retrieval and re-ranking results.
FoL benchmark comparison on Nordland, AmsterTime, SF-XL, SVOX, and SPED.
Additional Recall@1 comparison on Nordland, AmsterTime, SF-XL, SVOX, and SPED, covering large appearance, weather, and viewpoint changes.

Visualization

Colab Demo
Raw local matching visualization from FoL.
Raw matching results (FoL + similarity filtering).
FoL matching visualization after RANSAC verification.
Geometry-consistent matches after RANSAC (outliers removed).

Extended Work

Region Matters: Efficient and Reliable Region-Aware Visual Place Recognition

FoL was further extended in our subsequent work Region Matters, which explores more efficient and reliable region-aware representations for VPR.

Contact

For questions, contact: shunpengchen@bupt.edu.cn

BibTeX

@inproceedings{FoL,
  title={Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition},
  author={Wang, Changwei and Chen, Shunpeng and Song, Yukun and Xu, Rongtao and Zhang, Zherui and Zhang, Jiguang and Yang, Haoran and Zhang, Yu and Fu, Kexue and Du, Shide and others},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={7},
  pages={7536--7544},
  year={2025}
}