Skip to content

slimRL/slimCQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

133 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

slimCQL - simple, minimal and flexible offline Deep RL

python jax_badge Static Badge Code style: black License: MIT

slimCQL provides a concise and customizable implementation of Deep Q-Network (DQN) and Conservative Q-Learning (CQL) algorithms in Reinforcement Learning⛳ for Atari environments. It enables to quickly code and run proof-of-concept type of experiments in off-policy Deep RL settings.

User installation

GPU installation for Atari:

python3 -m venv env
source env/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]

To verify the installation, run the tests as: pytest

Running experiments

Dataset preparation

To train an offline RL agent on an Atari game for a given seed, we first download the dataset published in this work 👉 RL Unplugged (RLU). It contains the trajectories seen by a DQN agent trained for 50 million transitions.

To prepare the dataset for VideoPinball, seed 1:

  1. Download the dataset from RLU GCP bucket (run_1* fetches all files corresponding to seed 1):

    mkdir -p experiments/atari/datasets/rlu_dataset/VideoPinball
    gsutil -m cp -R gs://rl_unplugged/atari_episodes_ordered/VideoPinball/run_1* experiments/atari/datasets/rlu_dataset/VideoPinball

    This stores the raw trajectories in experiments/atari/datasets/rlu_dataset/VideoPinball.

  2. Convert the raw dataset into condensed numpy arrays (requires much less space, as it removes the redundant information), running: python3 experiments/atari/rlu_to_numpy.py --game [GAME] --run [RUN]. Once complete, the arrays for the given game and run are stored in experiments/atari/datasets/numpy_dataset/VideoPinball/1.

  3. Now you can prepare the replay buffers (used by the offline RL agent) for given values of $n$ and $\gamma$, by setting the update_horizon and gamma variables respectively in experiments/atari/prepare_replay_buffers.py, and running: python3 experiments/atari/prepare_replay_buffers.py --game [GAME] --run [RUN]. Upon completion, the replay buffers are stored in experiments/atari/datasets/slim_dataset/VideoPinball/1.

The dataset for is now ready for learning! At this point, you can delete the downloaded dataset in experiments/atari/datasets/rlu_dataset/VideoPinball if you don't need them anymore.

Training

To train a CQL agent on VideoPinball on your local system, run the launch file:
launch_job/atari/launch.sh

It trains and evaluates a CQL agent on VideoPinball (seed 1) with CNN architecture, for 3.125 million gradient steps, using 10% of the dataset collected by the DQN agent.

  • To see the stage of training, you can check the logs in experiments/atari/logs/test_VideoPinball/cql folder
  • The models at the end of each epoch are stored in experiments/atari/exp_output/test_VideoPinball/cql/models folder
  • To modify the percentage of DQN dataset to be used in training to $p$%, set the replay_buffer_capacity as $\lfloor p$% $\times 1,000,000\rfloor$ in the launch file.

To train on cluster, change launch_job/atari/local_cql.sh in the launch file, to launch_job/atari/cluster_cql.sh, and run the launch file.

Evaluation

The launch file organizes the evaluation upon completion of training as follows:

  • experiments/atari/evaluate.py is used to evaluate the model at the end of every epoch.
  • Once complete, experiments/synchronize_evaluation_wandb.py is used to:
    • Combine the evaluation returns into experiments/atari/exp_output/test_VideoPinball/cql/episode_returns_and_lengths folder,
    • Delete models corresponding to all epochs but the last (one at the end of the training),
    • Update wandb with evaluation results (if the flag --disable_wandb is not turned on).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors