A Multimodal Geospatially-Aware Image Retrieval Framework for Non-Geotagged Image Localization Using Contrastive Vision-Language Learning

S. Pratap Singh; Dr. Ch. Bindu Madhuri; Dr. P. Satheesh

pdf (English)

Publicado: May 12, 2026

S. Pratap Singh

Research Scholar, Department of Computer Science and Engineering, JNT University, Kakinada, AP, India.

Dr. Ch. Bindu Madhuri

HOD, Department of Information Technology, JNTUK, Vizianagaran, AP, India.

Dr. P. Satheesh

Professor, Department of Computer Science and Engineering, MVGR College of Engineering, Vizianagaram, AP, India.

Resumen

The large-scale growth of digital image collections across mobile platforms, online media, and public repositories has created significant demand for intelligent retrieval systems capable of understanding visual content together with its geographic context. Existing image retrieval approaches mainly rely on semantic feature similarity and often neglect spatial relationships, reducing their effectiveness for geospatial reasoning and location inference tasks. This work presents GeoCLIP-BLIP, a multimodal framework for retrieving and localizing non-geotagged images through combined semantic and geographic representation learning. The proposed approach integrates CLIP to extract semantic visual embeddings, a lightweight geographic encoding module to capture spatial information from coordinate data, and BLIP to generate descriptive captions that improve interpretability. Using a geo-referenced image database, the framework identifies visually related samples and estimates the probable geographic location of an input query image through similarity-based ranking. Retrieved results are further presented through an interactive map interface for intuitive spatial visualization. Experimental evaluation shows that the proposed framework achieves better retrieval relevance and geographic consistency than conventional CLIP-based retrieval methods. By combining semantic feature extraction, spatial embedding fusion, and caption-based explanation, GeoCLIP-BLIP provides an efficient solution for multimodal geospatial image retrieval and localization of non-geotagged images.

Número

Vol. 25 Núm. 1 (2026)

Sección

Articles

Barra lateral del artículo

Contenido principal del artículo

Resumen

Detalles del artículo