ICLR 2026 Conference Paper

SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition

SAGE learns efficient Visual Place Recognition (VPR) through dynamic geo-visual graph exploration, hard neighborhood mining, and lightweight parameter-efficient adaptation on a frozen DINOv2 backbone.

Shunpeng Chen, Changwei Wang, Rongtao Xu, Xingtian Pei, Yukun Song,
Jinzhou Lin, Wenhao Xu, Jingyi Zhang, Li Guo, Shibiao Xu

Paper ICLR Poster Code Models

SAGE architecture for adaptive graph exploration and visual place recognition. — Overview of SAGE. The framework updates the online geo-visual graph during training and combines hard neighborhood mining with Soft Probing.

Overview

SAGE addresses the limits of static sampling policies in VPR. During training, it continuously reconstructs an online geo-visual graph so that sample mining follows the model's evolving embedding space. A greedy weighted clique expansion sampler mines informative spatial-visual neighborhoods, while Soft Probing amplifies discriminative local patches before aggregation.

Highlights

Adaptive graph exploration

Training neighborhoods are refreshed online instead of being fixed before optimization.

Hard sample mining

A greedy weighted clique expansion sampler focuses training on challenging geo-visual neighborhoods.

Soft Probing

A lightweight residual weighting module boosts discriminative local patches before aggregation.

Parameter efficiency

The released model keeps the DINOv2 backbone frozen and trains a compact set of adaptation parameters.

Method Overview

1. Online graph reconstruction

SAGE repeatedly rebuilds a spatial-visual graph during training to track the current descriptor geometry.

2. Clique expansion mining

Hard and informative neighborhoods are selected with a greedy weighted clique expansion strategy.

3. Efficient representation learning

SoftP and parameter-efficient fine-tuning strengthen local evidence without updating the full backbone.

Results

SAGE is evaluated across VPR benchmarks with multiple descriptor dimensions and model variants. The screenshots below summarize the main comparison, extended benchmark coverage, and parameter-efficiency analysis.

SAGE main benchmark comparison on SPED, Pitts30k-test, MSLS-val, and Nordland. — Main benchmark comparison on SPED, Pitts30k-test, MSLS-val, and Nordland under different descriptor dimensions.

SAGE benchmark comparison on AmsterTime, Tokyo24/7, Pitts250k-test, and Eynsham. — Additional benchmark comparison on AmsterTime, Tokyo24/7, Pitts250k-test, and Eynsham.

SAGE comparison with recent backbone and VPR methods on SPED, MSLS-val, Nordland, Tokyo24/7, and Pitts250k-test. — Comparison with recent VPR methods, including VLAD-BuFF, EDTFormer, MegaLoc, ImAge, EffoVPR, FoL, and SelaVPR++.

SAGE-B and SAGE-L results across common VPR benchmarks. — SAGE (ViT-B and ViT-L) results without InteractHead at 322 × 322. Nordland⋆ uses 2,760 summer queries against a 27,592 winter database; Nordland uses 27,592 winter queries against a 27,592 summer database.

SAGE parameter-efficiency comparison with VPR baselines. — Parameter comparison (M) for VPR methods with DINOv2-B. Parentheses indicate parameters in the optional cross-image encoder.

Visualization

SAGE qualitative retrieval and feature activation analysis. — Qualitative retrieval and activation analysis. The left panel compares retrievals, while the right panel visualizes responses from SALAD, CFP, and SoftP.

Contact

For questions, contact: shunpengchen@bupt.edu.cn

BibTeX

@inproceedings{SAGE,
  title={SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition},
  author={Shunpeng Chen and Changwei Wang and Rongtao Xu and Xingtian Pei and Yukun Song and Jinzhou Lin and Wenhao Xu and Jingyi Zhang and Li Guo and Shibiao Xu},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=DCpbEXqPvS}
}