Skip to content

SalesforceAIResearch/droid_flow

DROID Flow Processing Pipeline

This is a pipeline for processing DROID RAW dataset, extracting RGB images, depth images, optical flow, and calculating scene flow from stereo camera recordings.

Download the Droid Raw dataset

# Raw DROID dataset in stereo HD, stored as MP4 videos (8.7TB)
gsutil -m cp -r gs://gresearch/robotics/droid_raw <path_to_your_target_dir>

File Structure

├── droid_raw/
    ├── droid/
        ├── 1.0.0   
        ├── 1.0.1
        │   ├── AUTOLab     
        │         ├── success 
        │               ├── ....  
        │         ├── failure   
        │   ├── CLVR 
        │   ├── .....
├── droid_flow/
├── droid_processed/

Environment Setup

To run the pipeline locally, set up the environment:

# Create and activate conda environment
conda create -n droid_flow python=3.10 -y
conda activate droid_flow

# Install dependencies
pip install torch torchvision
pip install requests

# Install ZED SDK
# Download from: https://www.stereolabs.com/developers/release/
# Run: ./ZED_SDK_Ubuntu22_cuda12.1_v4.1.4.zstd.run -- silent
cd /usr/local/zed/ && python get_python_api.py

# Fix dependencies
conda install -c conda-forge libstdcxx-ng -y
pip install h5py scipy opencv-python==4.10.0.84

# Resolve numpy compatibility
pip uninstall -y numpy
pip install numpy==1.24.0

Quick Start

git clone https://github.com/SalesforceAIResearch/droid_flow.git
# Run the pipeline (example for TRI dataset)
python droid_pipeline_main.py --dataset_name TRI --num_workers 1

Data Format

All processed data is saved in PNG format with BGR channels, except for depth, with one channel, is scaled up 10000 times for precision when casting to uint16.

Optical Flow (2 channels + mask)

Channel Meaning Formula
R Normalized Δx (pixels) (Δx + 1/4 * w) / (1/2 * w) * 65536
G Normalized Δy (pixels) (Δy + 1/4 * h) / (1/2 * h) * 65536
B Valid pixel mask (0/1) 65536 = valid, 0 = invalid
  • Where:
    • Δx, Δy = pixel displacements
    • w, h = image width and height

Scene Flow (3 channels)

Channel Meaning Formula
R Normalized Δx (meters) (Δx + 2) / 4 * 65536
G Normalized Δy (meters) (Δy + 2) / 4 * 65536
B Normalized Δz (meters) (Δz + 2) / 4 * 65536
  • Where:
    • Δx, Δy, Δz = displacements in meters

Output Structure

Each processed episode creates the following directory structure:

droid_processed/
└── {dataset_name}/
    └── {episode_name}/
        ├── metadata.json  
        ├── trajectory.h5   
        ├── camera_left/
        │   ├── rgb/      
        │   ├── depth/      
        │   ├── optical_flow_with_mask/  
        │   └── scene_flow/ 
        └── camera_right/
            ├── rgb/
            ├── depth/
            ├── optical_flow_with_mask/
            └── scene_flow/
        └── camera_wrist/
            ├── rgb/
            ├── depth/
            ├── optical_flow_with_mask/
            └── scene_flow/

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages