Skip to content

Mgineer117/IRPO

Repository files navigation

Intrinsic Reward Policy Optimization

arXiv IRPO

Authors

  • Minjae Cho (Correspondance) - The Grainger College of Engineering, University of Illinois Urbana-Champaign
  • Huy T. Tran - The Grainger College of Engineering, University of Illinois Urbana-Champaign

Citation

Please cite our paper if you use this code or algorithm for any part of your research or work:

@misc{cho2026intrinsicrewardpolicyoptimization,
      title={Intrinsic Reward Policy Optimization for Sparse-Reward Environments}, 
      author={Minjae Cho and Huy Trong Tran},
      year={2026},
      eprint={2601.21391},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.21391}, 
}

Prerequisites

In any local folder, open a terminal and run the following command to download our package into the current directory:

git clone https://github.com/Mgineer117/IRPO/
cd IRPO

We assume that you have Conda installed. If not, please refer to the Anaconda installation guide. Python 3.11.11 was used for our code.

We recommend creating a dedicated virtual environment as follows:

conda create -n IRPO python==3.11.*
conda activate IRPO

Then, install the required Python packages using:

pip install -r requirements.txt

Training

Our code uses the following command to train algorithms:

python3 main.py --env-name pointmaze-v1 --algo-name irpo

where all arguments should be written in all lowercase.

The above command will run the following algorithm:

IRPO_algorithm

Logging

We support three logging options—Weights & Biases (WandB), TensorBoard, and CSV—to accommodate different user preferences. Specifically, when WandB is properly configured on your local machine, all algorithmic and parameter settings, along with real-time training metrics, are automatically logged to your WandB dashboard. Simultaneously, training results are saved locally in TensorBoard format for visualization, and evaluation metrics are exported as CSV files for easy analysis. In addition, the model parameters are saved along with the best-performing one.

Environments

We consider the following widely used sparse-reward environments:

image

Note that the environments we used in our experiments are included in our repo. For the details of our environment configuration, check Appendix C in our paper.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Releases

No releases published

Packages

 
 
 

Contributors

Languages