- Minjae Cho (Correspondance) - The Grainger College of Engineering, University of Illinois Urbana-Champaign
- Huy T. Tran - The Grainger College of Engineering, University of Illinois Urbana-Champaign
Please cite our paper if you use this code or algorithm for any part of your research or work:
@misc{cho2026intrinsicrewardpolicyoptimization,
title={Intrinsic Reward Policy Optimization for Sparse-Reward Environments},
author={Minjae Cho and Huy Trong Tran},
year={2026},
eprint={2601.21391},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2601.21391},
}
In any local folder, open a terminal and run the following command to download our package into the current directory:
git clone https://github.com/Mgineer117/IRPO/
cd IRPO
We assume that you have Conda installed. If not, please refer to the Anaconda installation guide. Python 3.11.11 was used for our code.
We recommend creating a dedicated virtual environment as follows:
conda create -n IRPO python==3.11.*
conda activate IRPO
Then, install the required Python packages using:
pip install -r requirements.txt
Our code uses the following command to train algorithms:
python3 main.py --env-name pointmaze-v1 --algo-name irpo
where all arguments should be written in all lowercase.
The above command will run the following algorithm:
We support three logging options—Weights & Biases (WandB), TensorBoard, and CSV—to accommodate different user preferences. Specifically, when WandB is properly configured on your local machine, all algorithmic and parameter settings, along with real-time training metrics, are automatically logged to your WandB dashboard. Simultaneously, training results are saved locally in TensorBoard format for visualization, and evaluation metrics are exported as CSV files for easy analysis. In addition, the model parameters are saved along with the best-performing one.
We consider the following widely used sparse-reward environments:
Note that the environments we used in our experiments are included in our repo. For the details of our environment configuration, check Appendix C in our paper.
This project is licensed under the MIT License - see the LICENSE.md file for details