Skip to content

AMAP-ML/DreamX-World

Repository files navigation

DreamX-World: A General-Purpose Interactive World Model

DreamX Team

Page HuggingFace ModelScope Tech Report License


DreamX-World is a general-purpose world model for interactive world simulation. It generates diverse, high-fidelity worlds that users can explore, control, and transform with event prompts.

The model is trained with a scalable data engine on Unreal Engine data, gameplay footage, and real-world videos, combined with camera estimation and strict data filtering to learn realistic dynamics and interactions. It follows a progressive training pipeline: learning fine-grained action control first, then open-ended event response, and using Reinforcement Learning to improve action following, interaction consistency, and visual fidelity. Finally, through forcing and distillation, DreamX-World achieves efficient inference, making interactive generation practical at scale.

🔥 News

📆 Plan

  • ✔️ DreamX-World-5B-Cam Model.
  • DreamX-World-14B-Cam Model.
  • Autoregressive Video Generation Model.
  • Audio-Video Joint Generation Model.
  • Real-Time, Interactive, Long-horizon DreamX-World Model.
  • Release Technical Report.

🚀 Quick Start

Setup

  1. Install dependencies
pip install -r requirements.txt
  1. Download Wan2.2-5B-TI2V checkpoints from https://huggingface.co/Wan-AI

Inference

To generate videos, run the following script:

sh inference_5b.sh

Please check out inference_README.md for detailed instructions.

📍 Checkpoints

Model Download Link Details Instrutions
DreamX-World-5B-Cam Huggingface, ModelScope w PRoPE Camera Control inference_README.md

🎬 Video Demo

🌍 Navigate and Explore Realistic Worlds

DreamX-World enables high-fidelity, controllable exploration across diverse realistic environments, including indoor, urban, natural, and architectural scenes.

01.mp4
02.mp4
03.mp4
04.mp4
05.mp4
06.mp4
07.mp4
08.mp4

🌈 Dive into Dream Worlds

Beyond realistic scenes, DreamX-World also generates fantasy, game-like, sci-fi, and stylized worlds.

01.mp4
02.mp4
03.mp4
04.mp4
06.mp4
07.mp4
08.mp4
09.mp4

🎮 Generate in Third-Person View

DreamX-World supports both first-person interaction and coherent third-person generation. It keeps camera-follow behavior stable while preserving controllable agent motion and scene consistency.

01.mp4
02.mp4
03.mp4
04.mp4
05.mp4
07.mp4
08.mp4
10.mp4

⚡ Promptable World Events

DreamX-World supports prompt-driven world events that dynamically change the environment, including flexible and compositional event generation with consistent temporal evolution.

  • Single Event: A single event prompt triggers a specific world-changing interaction.
  • Compositional Events: Multiple events compose together to create complex, multi-step world transformations.

Single Event

01.mp4
02.mp4
03.mp4
04.mp4

Compositional Events

05.mp4
06.mp4
07.mp4
08.mp4

💬 WeChat Group

Join our WeChat group for discussion:

WeChat Group QR Code

Contact: 📧 ally.sl@alibaba-inc.com | hongxi.wjh@alibaba-inc.com

📜 License

This project is licensed under Apache 2.0. See LICENSE for details.

✨ Acknowledgement

We thank the Wan Team for open-sourcing their code and models.

About

DreamX-World: A General-Purpose Interactive World Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors