Advancing Geospatial Foundation Models: Generative Representations and Global Benchmarking
Time: Tue 2026-06-09 10.00
Location: E32, Lindstedtvägen 3, Campus, public video conference [MISSING]
Language: English
Subject area: Geodesy and Geoinformatics, Geoinformatics
Doctoral student: Yuru Jia , Geoinformatik, KU Leuven
Opponent: Professor Sébastien Lefèvre, Université Bretagne Sud (UBS), Vannes, France
Supervisor: Professor Yifang Ban, Geoinformatik; Associate Professor Andrea Nascetti, Geoinformatik
QC 20260528
Abstract
The explosion of Earth Observation (EO) data has driven the rapid development of Geospatial Foundation Models (GFMs) trained via self-supervised learning (SSL). While current discriminative SSL paradigms, such as contrastive learning and masked image modeling, have achieved notable success, they often struggle to capture the fine-grained spatial details and multi-scale complexities inherent in satellite imagery. Furthermore, the rapid architectural advancement of these models has outpaced the methodology used to evaluate them. Existing benchmarks frequently suffer from geographic biases, lack multi-modal and multi-temporal diversity, and rely on overly simplistic image-level classification tasks, thereby obscuring the true real-world capabilities and vulnerabilities of modern GFMs.
To address these representational limitations, this thesis first investigates the untapped potential of generative diffusion models for discriminative representation learning. We introduce SatDiFuser, a novel framework that repurposes a large-scale, pre-trained latent diffusion model for dense remote sensing tasks. By extracting multi-scale, multi-timestep features from the iterative denoising process and systematically aggregating them through advanced fusion strategies—including Global Weighting, Localized Weighting, and a Mixture of Experts (MoE) mechanism—SatDiFuser successfully transforms generative spatial priors into robust discriminative features, demonstrating superior performance on standard geospatial benchmarks.
While exploring novel representation architectures is crucial, accurately assessing the rapidly expanding GFM landscape requires overcoming a second fundamental challenge: the inadequacy of current evaluation protocols. To address this critical evaluation gap, this thesis subsequently introduces PANGAEA: a globally inclusive, standardized benchmark. Encompassing 11 diverse datasets across five critical domains (Urban, Agriculture, Disaster, Marine, and Forestry), PANGAEA evaluates models exclusively on complex, dense pixel-wise tasks while accounting for varying spatial resolutions, multi-modality (Optical and SAR), and multi-temporal dynamics. Extensive benchmarking of representative GFMs reveals critical insights into their generalization capabilities, robustness under data scarcity, and current limitations in multi-sensor fusion. Ultimately, this thesis bridges the gap between generative representation learning and rigorous global evaluation, laying a robust foundation for the development and assessment of the next generation of Geospatial Foundation Models.