You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rename REINFORCE to VPG in order to stay consistent with other libraries. Also, allow VPG to average the gradients over multiple episodes, drastically improving performance in some cases.
Tweaked A2C to make it align better with other implementations. In particular, a new n-step buffer was added that is more accurate. There are also some small changes to make sure feature gradients are computed correctly.
Includes a fully working DQN implementation, as well as a working partial Rainbow. Also includes Actor-Critic, Sarsa, and Reinforce implementations for classic control environments.