Releases: cpnota/autonomous-learning-library
Evaluation mode
This release contains some minor changes to several key APIs.
Agent Evaluation Mode
We added a new method to the Agent interface called eval. eval is the same as act, except the agent does not perform any training updates. This is useful for measure the performance of an agent at the end of a training run. Speaking of which...
Experiment Refactoring: Train/Test
We completely refactored the all.experiments module. First of all, the primary public entry point is now a function called run_experiment. Under the hood, there is a new Experiment interface:
class Experiment(ABC):
'''An Experiment manages the basic train/test loop and logs results.'''
@abstractmethod
def frame(self):
'''The index of the current training frame.'''
@property
@abstractmethod
def episode(self):
'''The index of the current training episode'''
@abstractmethod
def train(self, frames=np.inf, episodes=np.inf):
'''
Train the agent for a certain number of frames or episodes.
If both frames and episodes are specified, then the training loop will exit
when either condition is satisfied.
Args:
frames (int): The maximum number of training frames.
episodes (bool): The maximum number of training episodes.
'''
@abstractmethod
def test(self, episodes=100):
'''
Test the agent in eval mode for a certain number of episodes.
Args:
episodes (int): The number of test epsiodes.
Returns:
list(float): A list of all returns received during testing.
'''Notice the new method, experiment.test(). This method runs the agent in eval mode for a certain number of episodes and logs summary statistics (the mean and std of the returns).
Approximation: no_grad vs. eval
Finally, we clarified the usage of Approximation.eval(*inputs) by adding an additional method, Approximation.no_grad(*inputs). eval() both puts the network in evaluation mode and runs the forward pass with torch.no_grad(). no_grad() simply runs a forward pass in the current mode. The various Policy implementations were also adjusted to correctly execute the greedy behavior in eval mode.
0.4.0
Plots
Small but important update!
- Added
all.experiments.plotmodule, withplot_returns_100function that accepts arunsdirectory and plots contained results. - Tweaked the
a2cAtari preset to match the configuration of the other algorithms better
C51
Unification
This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!
This full list of changes is:
- Refactored experiment API (#88)
- Policies inherit from
Approximation(#89) - Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
- Automatically set the temperature in SAC (#91)
- Schedule learning rates and other parameters (#92)
- SAC bugfix
- Refactor usage of target networks. Now there is a difference between
eval()andtarget(): the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94) - Tweak
AdvantageBufferAPI. Also fix a minor bug in A2C (#95) - Report the best returns so far in separate metric (#96)
SAC Hotfix
A bunch in SoftDeterministicPolicy was slowing learning and causing numerical instability in some cases. This fixes that.
SAC
Added Soft-Actor Critic (SAC). SAC is a state-of-the-art algorithm for continuous control based on the max-entropy RL framework.
PPO + Vanilla
PPO and Vanilla release!
- Add PPO, one of the most popular modern RL algorithms.
- Add
Vanillaseries agents: "vanilla" implementations of actor-critic, sarsa, q-learning, and REINFORCE. These algorithms are all prefixed with the letter "v" in theagentsfolder.
DDPG
This release introduces continuous policies and agents, including DDPG. Also includes a number of quality-of-life improvements:
- Add
continuousagent suite - Add
Gaussianpolicy - Add
DeterministicPolicy - Introduce
Approximationbase class from whichQNetwork,VNetwork, etc. are derived - Convert
layersmodule toall.nn. Extend fromtorch.nnwith custom layers added, to make crafting unique networks easier. - Introduce
DDPGagent
act
The release contains a bunch of changes under the hood. The agent API was simplified down to a single method, action = agent.act(state, reward). The accompany this change, State was added as a first class object. Terminal states now have the state.mask set to 0, whereas before terminal states were represented by None.
Another major addition is slurm support. This is in particular to aid in running on gypsum. The SlurmExperiment API handles the creation of the appropriate .sh files, output, etc., so experiments can be run on slurm by writing a single python script! No more writing .sh files by hand! Examples can be found in the demos folder.
There were a few other minor changes as well.
Change log:
- Simplified agent API to only include
act#56 - Added State object #51
- Added SlurmExperiment for running on gypsum #53
- Updated the local and release scripts, and added slurm demos #54
- Tweaked parameter order in replay buffers #59
- Improved shared feature handling #63
- Made
write_losstogglable #64 - Tweaked default hyperparameters