Euclid Quick Data Release (Q1) Exploring galaxy properties with a multi-modal foundation model M. Siudek, M. Huertas-Company, M. Smith, G. Martinez-Solaeche, F. Lanusse, S. Ho, E. Angeloudi, P. A. C. Cunha, H. Domínguez Sánchez, M. Dunn, Y. Fu, P. Iglesias-Navarro, J. Junais, J. H. Knapen, B. Laloux, M. Mezcua, W. Roster, G. Stevens, J. Vega-Ferrero, Euclid Consortium
Arxiv Preprint
PDF DOI BIB ABSTRACT Keywords: foundation-models, computer-vision, euclid-consortium, classification Modern astronomical surveys, such as the Euclid mission, produce high-dimensional, multi-modal data sets that include imaging and spectroscopic information for millions of galaxies. These data serve as an ideal benchmark for large, pre-trained multi-modal models, which can leverage vast amounts of unlabelled data. In this work, we present the first exploration of Euclid data with AstroPT, an autoregressive multi-modal foundation model trained on approximately 300000 optical and infrared Euclid images and spectral energy distributions (SEDs) from the first Euclid Quick Data Release. We compare self-supervised pre-training with baseline fully supervised training across several tasks: galaxy morphology classification; redshift estimation; similarity searches; and outlier detection. Our results show that: (a) AstroPT embeddings are highly informative, correlating with morphology and effectively isolating outliers; (b) including infrared data helps to isolate stars, but degrades the identification of edge-on galaxies, which are better captured by optical images; (c) simple fine-tuning of these embeddings for photometric redshift and stellar mass estimation outperforms a fully supervised approach, even when using only 1% of the training labels; and (d) incorporating SED data into AstroPT via a straightforward multi-modal token-chaining method improves photo-z predictions, and allow us to identify potentially more interesting anomalies (such as ringed or interacting galaxies) compared to a model pre-trained solely on imaging data.
@misc{Siudek2025EuclidFoundation, author = {{Siudek}, M. and {Huertas-Company}, M. and {Smith}, M. and {Martinez-Solaeche}, G. and {Lanusse}, F. and {Ho}, S. and {Angeloudi}, E. and {Cunha}, P.~A.~C. and {Domínguez Sánchez}, H. and {Dunn}, M. and {Fu}, Y. and {Iglesias-Navarro}, P. and {Junais}, J. and {Knapen}, J.~H. and {Laloux}, B. and {Mezcua}, M. and {Roster}, W. and {Stevens}, G. and {Vega-Ferrero}, J. and the {Euclid Collaboration}.}, title = "{Euclid Quick Data Release (Q1) Exploring galaxy properties with a multi-modal foundation model}", year = {2025}, eprint = {2503.15312} }