The Platonic Universe: Testing The Platonic Representation Hypothesis With Astronomical Data ๐ฎ ๐ซ
This repository contains the code for testing the Platonic Representation Hypothesis (PRH) on astronomical data, as described in our paper "The Platonic Universe: Do Foundation Models See the Same Sky?"
The Platonic Representation Hypothesis suggests that neural networks trained with different objectives on different data modalities converge toward a shared statistical model of reality in their representation spaces. As models become larger and are trained on more diverse tasks, they should converge toward a "Platonic ideal" representation of underlying reality.
Astronomical observations provide an ideal testbed for the PRH because:
- Shared Physical Origin: Different astronomical observations (images, spectra, photometry) all emerge from the same underlying physics
- Multiple Modalities: We can compare representations across fundamentally different data types (like optical images, infrared images, and spectroscopy)
- Scale: Modern astronomical surveys provide the data volume necessary to test convergence across multiple model architectures
Our results (below) show that larger models exhibit more similar representations, even when trained across different data modalities. This suggests that astronomical foundation models may be able to leverage pre-trained general-purpose architectures.
-
Clone the repository:
git clone [https://github.com/UniverseTBD/platonic-universe.git](https://github.com/UniverseTBD/platonic-universe.git) cd platonic-universe -
Install dependencies using uv:
pip install uv uv sync
-
Install the package
uv pip install .
To install without uv, you can create a virtual environment and install the requirements manually:
python -m venv pu_env source pu_env/bin/activate pip install . ## or to install with SAM2 support: pip install ".[sam2]"
There are two methods to run experiments:
-
Using platonic_universe CLI:
platonic_universe run --model vit --mode jwst
-
Using python pakage directly:
import pu pu.run_experiment("vit", "sdss", batch_size=64, num_workers=1, knn_k=10)
Models Tested:
- Vision Transformers (ViT): Base, Large, Huge
- DINOv2: Small, Base, Large, Giant
- ConvNeXtv2: Nano, Tiny, Base, Large
- IJEPA: Huge, Giant
- AstroPT: Astronomy-specific transformer (Small, Base, Large)
- Specformer: Spectroscopy-specific model (Enable by uncommenting relevant lines in 'experiments.py')
Astronomical Datasets:
- HSC (Hyper Suprime-Cam): Ground-based optical imaging (reference baseline)
- JWST: Space-based infrared imaging
- Legacy Survey: Ground-based optical imaging
- DESI: Spectroscopy
The code measures representational alignment using the Mutual k-Nearest Neighbour (MKNN) metric:
from pu.metrics import mknn
# Calculate MKNN score between two embedding sets
score = mknn(embeddings_1, embeddings_2, k=10)
print(f"MKNN alignment score: {score:.4f}")Higher MKNN scores indicate more similar representations between models or modalities.
Note: the validation used ConvNeXtv2-Base (facebook/convnextv2-base-22k-224) for the image side. If more validation across other sizes/families is wanted we can run it, but it shouldn't be required โ this confirms the SpecFormer adapter produces correct embeddings and the cross-modal MKNN is consistent with the paper.
Re: whether platonic_universe run --model specformer --mode desi gives the same result โ not directly. That command generates and saves the SpecFormer embeddings to parquet, but since specformer is spectral-only it skips HSC and metric computation. To reproduce this MKNN via the CLI you'd:
platonic_universe run --model specformer --mode desi โ fresh specformer embeddings
platonic_universe run --model convnext --mode desi โ ConvNeXtv2 HSC embeddings (+ pre-computed DESI embeddings from Smith42/specformer_desi)
platonic_universe compare <specformer_parquet> --ref <convnext_parquet> --mode desi
Step 2 already produces an MKNN using the pre-computed specformer embeddings from HuggingFace, so that's the quickest way to sanity-check. The validation script above computes embeddings fresh end-to-end as an independent check.
This project is open source under the AGPLv3.
We welcome contributions! Please feel free to open a pull request to:
- Add support for new model architectures
- Include additional astronomical datasets
- Implement alternative similarity metrics
- Improve preprocessing pipelines
We also hang out on the UTBD Discord, so feel free to reach out there!
If you use this code in your research, please cite:
@article{utbd2025,
author = {{UniverseTBD} and Duraphe, K. and Smith, M. J. and Sourav, S. and Wu, J. F.},
title = {{The Platonic Universe: Do Foundation Models See the Same Sky?}},
journal = {ArXiv e-prints},
year = {2025},
eprint = {2509.19453},
doi = {10.48550/arXiv.2509.19453}
}