DreamX-World: A General-Purpose Interactive World Model

DreamX Team

DreamX-World is a general-purpose world model for interactive world simulation. It generates diverse, high-fidelity worlds that users can explore, control, and transform with event prompts.

The model is trained with a scalable data engine on Unreal Engine data, gameplay footage, and real-world videos, combined with camera estimation and strict data filtering to learn realistic dynamics and interactions. It follows a progressive training pipeline: learning fine-grained action control first, then open-ended event response, and using Reinforcement Learning to improve action following, interaction consistency, and visual fidelity. Finally, through forcing and distillation, DreamX-World achieves efficient inference, making interactive generation practical at scale.

🔥 News

2026.05.11: We open-sourced DreamX-World-5B-Cam and inference codes.

📆 Plan

✔️ DreamX-World-5B-Cam Model.
DreamX-World-14B-Cam Model.
Autoregressive Video Generation Model.
Audio-Video Joint Generation Model.
Real-Time, Interactive, Long-horizon DreamX-World Model.
Release Technical Report.

🚀 Quick Start

Setup

Install dependencies

pip install -r requirements.txt

Download Wan2.2-5B-TI2V checkpoints from https://huggingface.co/Wan-AI

Inference

To generate videos, run the following script:

sh inference_5b.sh

Please check out inference_README.md for detailed instructions.

📍 Checkpoints

Model	Download Link	Details	Instrutions
DreamX-World-5B-Cam	Huggingface, ModelScope	w PRoPE Camera Control	inference_README.md

🎬 Video Demo

Watch on YouTube

🌍 Navigate and Explore Realistic Worlds

DreamX-World enables high-fidelity, controllable exploration across diverse realistic environments, including indoor, urban, natural, and architectural scenes.

01.mp4	02.mp4	03.mp4	04.mp4
05.mp4	06.mp4	07.mp4	08.mp4

🌈 Dive into Dream Worlds

Beyond realistic scenes, DreamX-World also generates fantasy, game-like, sci-fi, and stylized worlds.

01.mp4	02.mp4	03.mp4	04.mp4
06.mp4	07.mp4	08.mp4	09.mp4

🎮 Generate in Third-Person View

DreamX-World supports both first-person interaction and coherent third-person generation. It keeps camera-follow behavior stable while preserving controllable agent motion and scene consistency.

01.mp4	02.mp4	03.mp4	04.mp4
05.mp4	07.mp4	08.mp4	10.mp4

⚡ Promptable World Events

DreamX-World supports prompt-driven world events that dynamically change the environment, including flexible and compositional event generation with consistent temporal evolution.

Single Event: A single event prompt triggers a specific world-changing interaction.
Compositional Events: Multiple events compose together to create complex, multi-step world transformations.

Single Event

01.mp4

02.mp4

03.mp4

04.mp4

Compositional Events

05.mp4

06.mp4

07.mp4

08.mp4

💬 WeChat Group

Join our WeChat group for discussion:

Contact: 📧 ally.sl@alibaba-inc.com | hongxi.wjh@alibaba-inc.com

📜 License

This project is licensed under Apache 2.0. See LICENSE for details.

✨ Acknowledgement

We thank the Wan Team for open-sourcing their code and models.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
demo		demo
dist		dist
models		models
outputs		outputs
pipeline		pipeline
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference_README.md		inference_README.md
inference_dreamx5b.py		inference_dreamx5b.py
inference_dreamx_5b.sh		inference_dreamx_5b.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamX-World: A General-Purpose Interactive World Model

🔥 News

📆 Plan