This is a pipeline for processing DROID RAW dataset, extracting RGB images, depth images, optical flow, and calculating scene flow from stereo camera recordings.
# Raw DROID dataset in stereo HD, stored as MP4 videos (8.7TB)
gsutil -m cp -r gs://gresearch/robotics/droid_raw <path_to_your_target_dir>├── droid_raw/
├── droid/
├── 1.0.0
├── 1.0.1
│ ├── AUTOLab
│ ├── success
│ ├── ....
│ ├── failure
│ ├── CLVR
│ ├── .....
├── droid_flow/
├── droid_processed/
To run the pipeline locally, set up the environment:
# Create and activate conda environment
conda create -n droid_flow python=3.10 -y
conda activate droid_flow
# Install dependencies
pip install torch torchvision
pip install requests
# Install ZED SDK
# Download from: https://www.stereolabs.com/developers/release/
# Run: ./ZED_SDK_Ubuntu22_cuda12.1_v4.1.4.zstd.run -- silent
cd /usr/local/zed/ && python get_python_api.py
# Fix dependencies
conda install -c conda-forge libstdcxx-ng -y
pip install h5py scipy opencv-python==4.10.0.84
# Resolve numpy compatibility
pip uninstall -y numpy
pip install numpy==1.24.0git clone https://github.com/SalesforceAIResearch/droid_flow.git
# Run the pipeline (example for TRI dataset)
python droid_pipeline_main.py --dataset_name TRI --num_workers 1All processed data is saved in PNG format with BGR channels, except for depth, with one channel, is scaled up 10000 times for precision when casting to uint16.
| Channel | Meaning | Formula |
|---|---|---|
| R | Normalized Δx (pixels) | (Δx + 1/4 * w) / (1/2 * w) * 65536 |
| G | Normalized Δy (pixels) | (Δy + 1/4 * h) / (1/2 * h) * 65536 |
| B | Valid pixel mask (0/1) | 65536 = valid, 0 = invalid |
- Where:
- Δx, Δy = pixel displacements
- w, h = image width and height
| Channel | Meaning | Formula |
|---|---|---|
| R | Normalized Δx (meters) | (Δx + 2) / 4 * 65536 |
| G | Normalized Δy (meters) | (Δy + 2) / 4 * 65536 |
| B | Normalized Δz (meters) | (Δz + 2) / 4 * 65536 |
- Where:
- Δx, Δy, Δz = displacements in meters
Each processed episode creates the following directory structure:
droid_processed/
└── {dataset_name}/
└── {episode_name}/
├── metadata.json
├── trajectory.h5
├── camera_left/
│ ├── rgb/
│ ├── depth/
│ ├── optical_flow_with_mask/
│ └── scene_flow/
└── camera_right/
├── rgb/
├── depth/
├── optical_flow_with_mask/
└── scene_flow/
└── camera_wrist/
├── rgb/
├── depth/
├── optical_flow_with_mask/
└── scene_flow/