Journal of Digital Landscape Architecture

Multimodal GeoAI for Urban Perception Prediction: A Case Study of Songdo, Incheon

This study explores the capabilities of multimodal foundation models (MFMs), specifically CLIP and GPT-4V, in predicting human perceptions of urban environments using geospatial data. By integrating multimodal inputs such as street view imagery and textual descriptions, the research aims to address the limitations of unimodal approaches and dataset specificity. The study utilizes a dataset from Songdo, South Korea, to assess six key perception variables: safety, liveliness, boredom, wealth, depression, and beauty. Results reveal that while MFMs demonstrate significant potential in zero-shot learning for urban perception prediction, they also highlight the need for improved multimodal integration and methodological advancements. This work paves the way for more robust GeoAI applications in urban planning and design.

PDF-Download

Autor / Author:	Han, Soyoung; Kim, Soyoung
Institution / Institution:	Incheon National University, Incheon/South Korea; Incheon National University, Incheon/South Korea
Seitenzahl / Pages:	8
Sprache / Language:	Englisch
Veröffentlichung / Publication:	JoDLA – Journal of Digital Landscape Architecture, 10-2025
Tagung / Conference:	Digital Landscape Architecture 2025 – Collaboration
Veranstaltungsort, -datum / Venue, Date:	Dessau Campus of Anhalt University, Germany 04-06-25 - 07-06-25
Schlüsselwörter (de):
Keywords (en):	GeoAI, multimodal foundation models (MFMs), urban perception prediction
Paper review type:	Full Paper Review
DOI:	doi:10.14627/537754061