slimCQL - simple, minimal and flexible offline Deep RL

slimCQL provides a concise and customizable implementation of Deep Q-Network (DQN) and Conservative Q-Learning (CQL) algorithms in Reinforcement Learning⛳ for Atari environments. It enables to quickly code and run proof-of-concept type of experiments in off-policy Deep RL settings.

User installation

GPU installation for Atari:

python3 -m venv env
source env/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]

To verify the installation, run the tests as: pytest

Running experiments

Dataset preparation

To train an offline RL agent on an Atari game for a given seed, we first download the dataset published in this work 👉 RL Unplugged (RLU). It contains the trajectories seen by a DQN agent trained for 50 million transitions.

To prepare the dataset for VideoPinball, seed 1:

Download the dataset from RLU GCP bucket (run_1* fetches all files corresponding to seed 1):

mkdir -p experiments/atari/datasets/rlu_dataset/VideoPinball
gsutil -m cp -R gs://rl_unplugged/atari_episodes_ordered/VideoPinball/run_1* experiments/atari/datasets/rlu_dataset/VideoPinball

This stores the raw trajectories in experiments/atari/datasets/rlu_dataset/VideoPinball.

Convert the raw dataset into condensed numpy arrays (requires much less space, as it removes the redundant information), running: python3 experiments/atari/rlu_to_numpy.py --game [GAME] --run [RUN]. Once complete, the arrays for the given game and run are stored in experiments/atari/datasets/numpy_dataset/VideoPinball/1.
Now you can prepare the replay buffers (used by the offline RL agent) for given values of $n$ and $\gamma$, by setting the update_horizon and gamma variables respectively in experiments/atari/prepare_replay_buffers.py, and running: python3 experiments/atari/prepare_replay_buffers.py --game [GAME] --run [RUN]. Upon completion, the replay buffers are stored in experiments/atari/datasets/slim_dataset/VideoPinball/1.

The dataset for is now ready for learning! At this point, you can delete the downloaded dataset in experiments/atari/datasets/rlu_dataset/VideoPinball if you don't need them anymore.

Training

To train a CQL agent on VideoPinball on your local system, run the launch file:
launch_job/atari/launch.sh

It trains and evaluates a CQL agent on VideoPinball (seed 1) with CNN architecture, for 3.125 million gradient steps, using 10% of the dataset collected by the DQN agent.

To see the stage of training, you can check the logs in experiments/atari/logs/test_VideoPinball/cql folder
The models at the end of each epoch are stored in experiments/atari/exp_output/test_VideoPinball/cql/models folder
To modify the percentage of DQN dataset to be used in training to $p$%, set the replay_buffer_capacity as $\lfloor p$% $\times 1,000,000\rfloor$ in the launch file.

To train on cluster, change launch_job/atari/local_cql.sh in the launch file, to launch_job/atari/cluster_cql.sh, and run the launch file.

Evaluation

The launch file organizes the evaluation upon completion of training as follows:

experiments/atari/evaluate.py is used to evaluate the model at the end of every epoch.
Once complete, experiments/synchronize_evaluation_wandb.py is used to:
- Combine the evaluation returns into experiments/atari/exp_output/test_VideoPinball/cql/episode_returns_and_lengths folder,
- Delete models corresponding to all epochs but the last (one at the end of the training),
- Update wandb with evaluation results (if the flag --disable_wandb is not turned on).

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
experiments		experiments
launch_job		launch_job
slimcql		slimcql
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

slimCQL - simple, minimal and flexible offline Deep RL

User installation

Running experiments

Dataset preparation

Training

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

slimCQL - simple, minimal and flexible offline Deep RL

User installation

Running experiments

Dataset preparation

Training

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages