Modular MapAnything Training Pipeline

Overview

We release the complete data processing pipeline, metadata, training scripts, and configs to exactly reproduce the final released version of MapAnything. We also provide training bash scripts for all main models and ablations reported in the paper.

Data Processing

Before training, setup all the WAI format data following the Data Processing README. Once converted, verify the dataloaders work correctly, for example:

# Verify BlendedMVS dataloader with Rerun visualization
python mapanything/datasets/wai/blendedmvs.py \
    --root_dir /path/to/blendedmvs \
    --dataset_metadata_dir /path/to/metadata \
    --num_of_views 4 \
    --viz

See the main call in each dataloader file (e.g., mapanything/datasets/wai/blendedmvs.py) for more details and use --viz for Rerun visualization. Our dataloaders by default support varying number of views during training.

Quick Start: Single Dataset Training

For quick experimentation, we provide a single dataset example using BlendedMVS (see bash_scripts/train/examples/mapa_curri_4v_bmvs_48ipg_8g.sh or bash_scripts/train/examples/mapa_img_only_4v_bmvs_48ipg_8g.sh).

Tips:

💡 Use model.info_sharing.module_args.gradient_checkpointing=true and model.pred_head.gradient_checkpointing=true to save GPU memory

💡 Adjust max_num_of_imgs_per_gpu to control memory usage and effective batch size

💡 Scale learning rate proportionally when changing batch size (effective batch size at a given number of views during training = NUM_GPUS × max_num_of_imgs_per_gpu / num_of_views)

Main Training Scripts

All original training bash scripts are available in bash_scripts/train/. The main models from the paper can be reproduced using:

# Stage 1: 4-view training - 8 Nodes
bash bash_scripts/train/main/mapa_curri_4v_13d_36ipg_64g.sh 8

# Stage 2: 24-view training - 8 Nodes
bash bash_scripts/train/main/mapa_curri_24v_13d_36ipg_64g.sh 8

# Stage 3: 24-view confidence training - 8 Nodes
bash bash_scripts/train/main/mapa_v1_1.sh 8

# Likewise for Apache 2.0 licensed variants - 8 Nodes
bash bash_scripts/train/main/mapa_curri_4v_6d_36ipg_64g_apache.sh 8
bash bash_scripts/train/main/mapa_curri_24v_6d_36ipg_64g_apache.sh 8
bash bash_scripts/train/main/mapa_v1_1_apache.sh 8

Update the machine configuration in configs/machine/your_machine.yaml and adjust paths in the bash scripts before execution.

Note regarding DINO initialization

We use DINO init for the multi-view transformer and the default training config expects weights to be present at ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth.

The DINO init multi-view transformer checkpoint can be generated and saved to ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth using UniCeption:

python3 scripts/convert_dino_to_info_sharing.py --output_path ${machine.root_pretrained_checkpoints_dir}/aat_init_w_dinov2_vitg_layers_24_to_40.pth --start 24 --encoder_str dinov2_giant --info_sharing_class alternating

Paper Ablations

We provide bash scripts for all ablations studied in the paper (see bash_scripts/train/ablations/).

Fine-tuning Other Models

Beyond MapAnything training, our modular data and training pipeline can improve performance of other models. We provide fine-tuning scripts for:

MoGe-2 Fine-tuning

bash bash_scripts/train/finetuning/moge2_finetuning.sh 8

VGGT Fine-tuning

bash bash_scripts/train/finetuning/vggt_finetuning.sh 8

π³ Fine-tuning (with Equivariant Reference Frame)

bash bash_scripts/train/finetuning/pi3_finetuning.sh 8

These scripts demonstrate how our comprehensive training data and pipeline can enhance existing 3D reconstruction models, including support for equivariant reference frames (i.e., the scheme used by π³).

Example Fine-tuning Results

We show the average performance on ETH3D, ScanNet++V2 & TartanAirV2-WB at varying number of views for our finetuned variants of VGGT & π³ using the MapAnything data and training framework. Note that there might be data leak for π³ on our ScanNet++V2 & TartanAirV2-WB benchmarking since the original train scene splits are unknown. Despite this, we show that our finetuned models show higher performance than the public π³ (finetuned on top of VGGT) & VGGT models:

Configuration System

MapAnything uses Hydra for configuration management. Key configuration categories:

Overall: configs/train.yaml - Entrypoint
Machine: configs/machine/ - Path settings
Dataset: configs/dataset/ - Dataset selection and parameters
Model: configs/model/ - Model architecture configurations
Loss: configs/loss/ - Loss function definitions
Train Params: configs/train_params/ - Training hyperparameters

Multi-Node Training

For large-scale training, use the multi-node configurations in the main training scripts:

# Example multi-node execution
bash bash_scripts/train/main/mapa_curri_24v_13d_48ipg_64g.sh \
    NUM_GPUS NUM_NODES NODE_RANK JOB_ID HOST_NODE_ADDR MAX_RESTARTS

The scripts include optimized settings for AWS multi-node training with EFA networking. These can be removed/updated based on your compute infrastructure.

Dataset Coverage

The training scripts support all 13 training datasets (with appropriate splits) converted to WAI format:

✅ Aria Synthetic Environments
✅ BlendedMVS
✅ DL3DV-10K
✅ Dynamic Replica
✅ Mapillary Planet Scale Depth & Reconstructions (MPSD)
✅ MegaDepth (including Tanks & Temples)
✅ MVS-Synth
✅ Parallel Domain 4D
✅ SAIL-VOS 3D
✅ ScanNet++ v2
✅ Spring
✅ TartanAirV2 Wide Baseline
✅ UnrealStereo4K

Reproducing Results

To exactly reproduce the paper results:

Process datasets using instructions in Data Processing README
Download pre-computed metadata from HuggingFace
Run the main training scripts with your machine configuration
Use the same hyperparameters and data splits as provided in the configs

Results will be saved to the configured hydra.run.dir directory with regular checkpointing and training logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular MapAnything Training Pipeline

Overview

Data Processing

Quick Start: Single Dataset Training

Main Training Scripts

Note regarding DINO initialization

Paper Ablations

Fine-tuning Other Models

MoGe-2 Fine-tuning

VGGT Fine-tuning

π³ Fine-tuning (with Equivariant Reference Frame)

Example Fine-tuning Results

Configuration System

Multi-Node Training

Dataset Coverage

Reproducing Results

FilesExpand file tree

train.md

Latest commit

History

train.md

File metadata and controls

Modular MapAnything Training Pipeline

Overview

Data Processing

Quick Start: Single Dataset Training

Main Training Scripts

Note regarding DINO initialization

Paper Ablations

Fine-tuning Other Models

MoGe-2 Fine-tuning

VGGT Fine-tuning

π³ Fine-tuning (with Equivariant Reference Frame)

Example Fine-tuning Results

Configuration System

Multi-Node Training

Dataset Coverage

Reproducing Results