Zero-Shot Motion Tracking for Unitree G1

Text-to-Physics in one command: Type a motion description, get a physically-simulated humanoid robot performing it — no training required.

A streamlined pipeline that takes a text prompt, generates a motion with Kimodo, and runs a physics-simulated Unitree G1 humanoid robot tracking that motion in MuJoCo — all zero-shot, with no training required.

The pipeline uses NVIDIA's ProtoMotions pretrained Generalist Tracking Policy (GTP), exported as an ONNX model, to drive PD joint controllers in a full MuJoCo physics simulation with gravity, contacts, and actuator dynamics.

Highlights

Zero-shot: No per-motion training. A single pretrained policy tracks any motion.
Text-to-physics: Describe a motion in natural language, watch a robot perform it with real physics.
Full physics simulation: Gravity, ground contacts, 29 PD-controlled actuators at 1kHz.
Fast: Motion generation (~2s on GPU) + physics sim runs at 20x real-time.
Self-contained: All assets (ONNX model, G1 MJCF, meshes) bundled. Just pip install and run.

Demo

All motions generated from text prompts by Kimodo, then physically simulated with ProtoMotions GTP in MuJoCo (full physics: gravity, contacts, PD actuators).

Confident Walk `"a person walking forward with long confident strides while swinging their arms"`	Boxing `"a person throwing left and right punches while stepping forward in a boxing stance"`
Arms Raise `"a person standing in place and raising both arms above their head then lowering them"`	Bowing `"a person bowing forward politely and then standing upright again"`
Side Step `"a person stepping sideways to the left and then sideways to the right"`	Squats `"a person doing squats"`
Walking Forward `"a person walking forward"`	Turning Around `"a person turning around slowly"`

How It Works

Text Prompt ("a person walking forward")
        |
        v
   Kimodo G1 Model
   (text-to-motion diffusion)
        |
        v
   MuJoCo qpos [T, 36]
   (root xyz + quat + 29 joint angles)
        |
        v
   Proto Bridge (MuJoCo FK)
   body_pos [T, 33, 3], body_rot [T, 33, 4], dof [T, 29]
   resampled 30 Hz -> 50 Hz via SLERP
        |
        v
   ONNX Tracker Control Loop (50 Hz)
   +--> Read robot state from MuJoCo
   |    Query 4 future reference frames [+1, +2, +4, +8]
   |    ONNX inference -> PD joint targets
   |    Apply accel clamp + EMA filter
   |    Step MuJoCo physics (20 substeps @ 1kHz)
   +--- Update viewer
        |
        v
   G1 Robot in MuJoCo Viewer / Viser
   (full physics: gravity, contacts, PD actuators)

Quick Start

Prerequisites

Python 3.10+
CUDA GPU (for Kimodo motion generation)
Kimodo installed (pip install -e . in the kimodo repo)

Install Dependencies

pip install onnxruntime pyyaml mujoco viser

Run

# From a text prompt (requires Kimodo model)
python -m pipeline.run_g1_zeroshot --prompt "a person walking forward" --duration 5

# From an existing MuJoCo qpos CSV
python -m pipeline.run_g1_zeroshot --csv output.csv

# From an existing Kimodo NPZ output
python -m pipeline.run_g1_zeroshot --npz output.npz

The MuJoCo viewer will open showing the G1 robot physically tracking the motion.

Viewer Options

# Viser web viewer (default) - open http://localhost:8080
python -m pipeline.run_g1_zeroshot --prompt "walking" --duration 3

# MuJoCo native viewer
python -m pipeline.run_g1_zeroshot --prompt "walking" --duration 3 --native-viewer

# Headless (no visualization)
python -m pipeline.run_g1_zeroshot --prompt "walking" --duration 3 --no-render

# Control loops and speed
python -m pipeline.run_g1_zeroshot --csv output.csv --loops 5 --no-realtime

Record Video

python -m pipeline.record_video --prompt "a person doing squats" --duration 5 --output my_video.mp4

Architecture

File Structure

pipeline/
    run_g1_zeroshot.py        # Main entry point (text/CSV/NPZ -> physics sim)
    record_video.py           # Offscreen recording to MP4
    proto_bridge.py           # Kimodo qpos -> MotionPlayer format via MuJoCo FK
    setup_proto_assets.py     # Helper to copy assets from ProtoMotions clone
    deploy/                   # Vendored ProtoMotions deployment code
        state_utils.py        #   Pure-numpy quaternion/rotation utilities
        motion_utils.py       #   MotionPlayer with self-contained SLERP
        mujoco_runner.py      #   ONNX tracker control loop + viewer
    assets/
        proto_g1/             # ProtoMotions G1 MJCF + meshes (33 bodies, 29 DOFs)
        proto_tracker/        # Pretrained ONNX tracker + YAML sidecar

Key Components

Component	Source	Purpose
Kimodo	nv-tlabs/kimodo	Text-to-motion diffusion model for G1
ProtoMotions GTP	NVlabs/ProtoMotions	Pretrained tracking policy (ONNX)
MuJoCo	mujoco.org	Physics simulation (1kHz)
Viser	nerfstudio-project/viser	3D web viewer

Physics Details

Simulation: MuJoCo with full physics (gravity, contacts, friction)
Control rate: 50 Hz (20ms per control step)
Physics rate: 1000 Hz (1ms substeps, 20x decimation)
Actuators: 29 implicit PD controllers with per-joint stiffness/damping
Robot: Unitree G1 (33 bodies, 29 DOFs)
Policy: BeyondMimic tracker with 4-step future lookahead [+1, +2, +4, +8 frames]

Zero-Shot Tracking

The pretrained GTP was trained on a large diverse motion dataset. At inference time, it generalizes to any new motion without retraining:

Receives the current robot state (joint positions, velocities, torso orientation)
Receives 4 future reference frames from the target motion
Outputs PD joint position targets that physically track the motion while maintaining balance

This is in contrast to per-motion training approaches (like DeepMimic PPO) which require hours of training for each new motion clip.

Setting Up Assets

The pipeline needs ProtoMotions assets (ONNX model + G1 MJCF). These are bundled in pipeline/assets/. To regenerate from source:

# Clone ProtoMotions
git clone https://github.com/NVlabs/ProtoMotions.git

# Copy assets
python -m pipeline.setup_proto_assets --proto-path ./ProtoMotions

Acknowledgements

Kimodo - Kinematic Motion Diffusion Model (NVIDIA)
ProtoMotions - Physics simulation and RL framework for humanoids (NVIDIA)
MuJoCo - Multi-Joint dynamics with Contact (Google DeepMind)

License

This pipeline code is provided for research purposes. The bundled assets (ONNX model, MJCF) are from ProtoMotions (Apache 2.0). Kimodo is licensed under Apache 2.0 with NVIDIA Open Model License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pipeline		pipeline
samples		samples
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Shot Motion Tracking for Unitree G1

Highlights

Demo

How It Works

Quick Start

Prerequisites

Install Dependencies

Run

Viewer Options

Record Video

Architecture

File Structure

Key Components

Physics Details

Zero-Shot Tracking

Setting Up Assets

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot Motion Tracking for Unitree G1

Highlights

Demo

How It Works

Quick Start

Prerequisites

Install Dependencies

Run

Viewer Options

Record Video

Architecture

File Structure

Key Components

Physics Details

Zero-Shot Tracking

Setting Up Assets

Acknowledgements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages