PDPR: Panoramic-Depth Place Recognition through the fusion of visual and geometric-aware features

1Institute for Engineering Research (I3E), Miguel Hernandez University of Elche (Spain)
2Valencian Graduate School and Research Network for Artificial Intelligence (valgrAI)
MY ALT TEXT

General outline of PDPR. First, relative depth maps are obtained from panoramic images using Depth Anything v2 [1]. Second, both images and depth maps are transformed into embeddings through the same frozen VPR model. Third, visual and depth embeddings are merged into a global descriptor through late fusion.

Abstract

Omnidirectional cameras are a suitable and cost-effective choice for Visual Place Recognition (VPR), as they provide comprehensive information from the scene regardless of the robot orientation. However, vision sensors are vulnerable to environmental appearance changes (e.g., illumination, season). While multi-modal approaches can overcome these challenges, they introduce significant cost and system complexity. This paper introduces a novel fusion framework that enhances VPR robustness by integrating visual data with geometric features derived from monocular depth estimation, retaining a single-camera setup. In the ablation study, both early and late fusion strategies are evaluated to optimally combine appearance-based and depth-derived descriptors. The extensive evaluation on challenging, indoor and outdoor datasets demonstrates that the proposed method consistently boosts retrieval performance across multiple state-of-the-art VPR backbones. Furthermore, this improvement is achieved without requiring end-to-end retraining, allowing our method to function as a pluggable module for pre-trained models. Consequently, this work presents a powerful, practical, and low-cost solution for robust VPR, with high potential to scale as monocular depth estimation and VPR models continue to improve.

Depth Estimation and Processing

COLD database [2]

We propose PDPR, a novel fusion framework that enhances VPR robustness by integrating visual data with depth maps, which are preprocessed to adapt them as suitable inputs for VPR models.

360Loc database [3]

Depth maps obtained by means of Depth Anything v2 [1], a state-of-the-art depth estimation model, and preprocessed through various techniques to ensure compatibility with the VPR model knowledge and to enhance the geometric information captured.

Trajectory Results

COLD database [2]

FR-A Environment - Query Cloudy / Database Cloudy

FR-A Environment - Query Night / Database Cloudy

SA-A Environment - Query Cloudy / Database Cloudy

SA-B Environment - Query Sunny / Database Cloudy

360Loc dataset [3]

Atrium Environment - Query Day / Database Day

Atrium Environment - Query Night / Database Day

Concourse Environment - Query Day / Database Day

Hall Environment - Query Night / Database Day

Highlighted Examples

COLD database [2]

360Loc database [3]

Quantitative Results

MY ALT TEXT

R@1 and R@1 results achieved by different VPR models with RGB images (no fusion) and with our method (PDPR).

References

  1. Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., & Zhao, H. (2024). Depth anything v2. Advances in Neural Information Processing Systems, 37, 21875-21911.
  2. Pronobis, A., & Caputo, B. (2009). COLD: The CoSy localization database. The International Journal of Robotics Research, 28(5), 588-594.
  3. Huang, H., Liu, C., Zhu, Y., Cheng, H., Braud, T., & Yeung, S. K. (2024). 360loc: A dataset and benchmark for omnidirectional visual localization with cross-device queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 22314-22324).

How to cite this work

Soon available.

Acknowledgements

MY ALT TEXT

The Ministry of Science, Innovation and Universities (Spain) has funded this work through FPU23/00587 (M. Alfaro) and FPU21/04969 (J.J. Cabrera). This work is part of the projects PID2023-149575OB-I00, funded by MICIU/AEI/10.13039/501100011033 and by FEDER UE, and CIPROM/2024/8, funded by Generalitat Valenciana.