We release the complete data processing pipeline, metadata, training scripts, and configs to exactly reproduce the final released version of MapAnything. We also provide training bash scripts for all main models and ablations reported in the paper.
Before training, setup all the WAI format data following the Data Processing README. Once converted, verify the dataloaders work correctly, for example:
# Verify BlendedMVS dataloader with Rerun visualization
python mapanything/datasets/wai/blendedmvs.py \
--root_dir /path/to/blendedmvs \
--dataset_metadata_dir /path/to/metadata \
--num_of_views 4 \
--vizSee the main call in each dataloader file (e.g., mapanything/datasets/wai/blendedmvs.py) for more details and use --viz for Rerun visualization. Our dataloaders by default support varying number of views during training.
For quick experimentation, we provide a single dataset example using BlendedMVS (see bash_scripts/train/examples/mapa_curri_4v_bmvs_48ipg_8g.sh or bash_scripts/train/examples/mapa_img_only_4v_bmvs_48ipg_8g.sh).
Tips:
💡 Use model.info_sharing.module_args.gradient_checkpointing=true and model.pred_head.gradient_checkpointing=true to save GPU memory
💡 Adjust max_num_of_imgs_per_gpu to control memory usage and effective batch size
💡 Scale learning rate proportionally when changing batch size (effective batch size at a given number of views during training = NUM_GPUS × max_num_of_imgs_per_gpu / num_of_views)
All original training bash scripts are available in bash_scripts/train/. The main models from the paper can be reproduced using:
# Stage 1: 4-view training - 8 Nodes
bash bash_scripts/train/main/mapa_curri_4v_13d_36ipg_64g.sh 8
# Stage 2: 24-view training - 8 Nodes
bash bash_scripts/train/main/mapa_curri_24v_13d_36ipg_64g.sh 8
# Stage 3: 24-view confidence training - 8 Nodes
bash bash_scripts/train/main/mapa_v1_1.sh 8
# Likewise for Apache 2.0 licensed variants - 8 Nodes
bash bash_scripts/train/main/mapa_curri_4v_6d_36ipg_64g_apache.sh 8
bash bash_scripts/train/main/mapa_curri_24v_6d_36ipg_64g_apache.sh 8
bash bash_scripts/train/main/mapa_v1_1_apache.sh 8Update the machine configuration in configs/machine/your_machine.yaml and adjust paths in the bash scripts before execution.
We use DINO init for the multi-view transformer and the default training config expects weights to be present at ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth.
The DINO init multi-view transformer checkpoint can be generated and saved to ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth using UniCeption:
python3 scripts/convert_dino_to_info_sharing.py --output_path ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth --start 24 --encoder_str dinov2_giant --info_sharing_class alternatingWe provide bash scripts for all ablations studied in the paper (see bash_scripts/train/ablations/).
Beyond MapAnything training, our modular data and training pipeline can improve performance of other models. We provide fine-tuning scripts for:
bash bash_scripts/train/finetuning/moge2_finetuning.sh 8bash bash_scripts/train/finetuning/vggt_finetuning.sh 8bash bash_scripts/train/finetuning/pi3_finetuning.sh 8These scripts demonstrate how our comprehensive training data and pipeline can enhance existing 3D reconstruction models, including support for equivariant reference frames (i.e., the scheme used by π³).
We show the average performance on ETH3D, ScanNet++V2 & TartanAirV2-WB at varying number of views for our finetuned variants of VGGT & π³ using the MapAnything data and training framework. Note that there might be data leak for π³ on our ScanNet++V2 & TartanAirV2-WB benchmarking since the original train scene splits are unknown. Despite this, we show that our finetuned models show higher performance than the public π³ (finetuned on top of VGGT) & VGGT models:
MapAnything uses Hydra for configuration management. Key configuration categories:
- Overall:
configs/train.yaml- Entrypoint - Machine:
configs/machine/- Path settings - Dataset:
configs/dataset/- Dataset selection and parameters - Model:
configs/model/- Model architecture configurations - Loss:
configs/loss/- Loss function definitions - Train Params:
configs/train_params/- Training hyperparameters
For large-scale training, use the multi-node configurations in the main training scripts:
# Example multi-node execution
bash bash_scripts/train/main/mapa_curri_24v_13d_48ipg_64g.sh \
NUM_GPUS NUM_NODES NODE_RANK JOB_ID HOST_NODE_ADDR MAX_RESTARTSThe scripts include optimized settings for AWS multi-node training with EFA networking. These can be removed/updated based on your compute infrastructure.
The training scripts support all 13 training datasets (with appropriate splits) converted to WAI format:
- ✅ Aria Synthetic Environments
- ✅ BlendedMVS
- ✅ DL3DV-10K
- ✅ Dynamic Replica
- ✅ Mapillary Planet Scale Depth & Reconstructions (MPSD)
- ✅ MegaDepth (including Tanks & Temples)
- ✅ MVS-Synth
- ✅ Parallel Domain 4D
- ✅ SAIL-VOS 3D
- ✅ ScanNet++ v2
- ✅ Spring
- ✅ TartanAirV2 Wide Baseline
- ✅ UnrealStereo4K
To exactly reproduce the paper results:
- Process datasets using instructions in Data Processing README
- Download pre-computed metadata from HuggingFace
- Run the main training scripts with your machine configuration
- Use the same hyperparameters and data splits as provided in the configs
Results will be saved to the configured hydra.run.dir directory with regular checkpointing and training logs.
