AstroVision Model for Imaging and Spectroscopy (AVIS)

Abstract

Astronomical datasets have expanded dramatically in size and complexity over the past two decades, offering vast opportunities for exploration but posing challenges due to limited labeled data. Self-Supervised Learning (SSL) is an innovative technique to address this issue. We present some preliminary results from designing a foundation model – Astro Vision model for Imaging and Spectroscopy (AVIS) that can combine the information from imaging and spectroscopy. Our model is trained using DESI Y3 spectra and Legacy Imaging Survey (LS) DR9 and DR10 imaging data. Our analysis of embedding vectors generated by the trained model in latent space shows the model can understand the correlational features between images and spectra for each type of source (QSO, galaxies, stars, etc.). The model can perform redshift estimation without any supervisory information provided. We will examine these diagnostics and discuss some further analysis on the model to evaluate its understanding of astrophysical objects. Some potential downstream science applications will also be considered.

Clustering analysis of embedding vectors

We perform clustering analysis on the embedding vectors generated by the AVIS model for DESI Y3 sources. We use the Pairwise Controlled Manifold Approximation (PaCMAP) algorithm to reduce the high-dimensional embedding vectors into 2D and 3D spaces for visualization. The clusters are clearly separated with various meaning. In the following figures, we color the points based on different properties of DESI sources to understand the clustering results.

Clustering results with SPECTYPE

Clustering results with redshift

The 2D PaCMAP plot of embedding vectors colored by redshift.

By morphological types

The 2D PaCMAP plot of embedding vectors colored by morphology.

By half-light radius

The 2D PaCMAP plot of embedding vectors colored by half-light radius.

By stellar

Sources are classified via spectral fitting into different stellar subtypes (A, F, G, K, M, WD, CV).

The 2D PaCMAP plot of embedding vectors colored by stellar types.

Clustering results with DESI Target classes

The 2D PaCMAP plot of embedding vectors colored by DESI Target classes.

The 3D interactive plot for clustering results with DESI Target classes

Click on this link to access the full interactive plot: Interactive Plot

Self clustering

We also perform self-clustering on the embedding vectors without any labels. We use the HDBSCAN algorithm to identify clusters in the embedding space. The results show that the model can group similar sources together based on their learned features.

The 2D PaCMAP plot of embedding vectors colored by HDBSCAN clustering results.