A clean, interactive PyTorch implementation of "World Models" by David Ha and Jürgen Schmidhuber.
This project implements the complete World Models architecture for the CarRacing-v3 environment from Gymnasium. World Models consist of three components:
- Vision (V): A Variational Autoencoder (VAE) that compresses raw images into latent representations
- Memory (M): A Mixed Density Network with LSTM (MDN-RNN) that predicts future states
- Controller (C): A simple neural network policy trained with CMA-ES
The implementation is organized into interactive notebooks that explain each component:
| Notebook | Description |
|---|---|
| 1-Rollouts.ipynb | Generating dataset from environment interactions |
| 2-Vision (VAE).ipynb | Training the Variational Autoencoder |
| 3-Memory (rnn-mdn).ipynb | Building the MDN-RNN predictive model |
| 4-Controller (C).ipynb | Evolutionary training of the controller |
| 5-Videos.ipynb | Generating videos of model performance |
- Pure PyTorch implementation with clean, commented code
- Interactive Visualization of latent space and model predictions
- End-to-End Pipeline from data collection to agent training
- Pre-trained Models included in
checkpoints/directory - Modular Design allowing for experimentation with architectures
@Ha2018WorldModels
Ha, David and Schmidhuber, Jürgen. "World Models." Zenodo, 2018. Link to paper.
Copyright: Creative Commons Attribution 4.0.


