Releases · cpnota/autonomous-learning-library

18 Apr 18:39

cpnota

v0.5.0

57536b2

Evaluation mode

This release contains some minor changes to several key APIs.

Agent Evaluation Mode

We added a new method to the Agent interface called eval. eval is the same as act, except the agent does not perform any training updates. This is useful for measure the performance of an agent at the end of a training run. Speaking of which...

Experiment Refactoring: Train/Test

We completely refactored the all.experiments module. First of all, the primary public entry point is now a function called run_experiment. Under the hood, there is a new Experiment interface:

class Experiment(ABC):
    '''An Experiment manages the basic train/test loop and logs results.'''

    @abstractmethod
    def frame(self):
        '''The index of the current training frame.'''

    @property
    @abstractmethod
    def episode(self):
        '''The index of the current training episode'''

    @abstractmethod
    def train(self, frames=np.inf, episodes=np.inf):
        '''
        Train the agent for a certain number of frames or episodes.
        If both frames and episodes are specified, then the training loop will exit
        when either condition is satisfied.

        Args:
                frames (int): The maximum number of training frames.
                episodes (bool): The maximum number of training episodes.
        '''

    @abstractmethod
    def test(self, episodes=100):
        '''
        Test the agent in eval mode for a certain number of episodes.

        Args:
            episodes (int): The number of test epsiodes.

        Returns:
            list(float): A list of all returns received during testing.
        '''

Notice the new method, experiment.test(). This method runs the agent in eval mode for a certain number of episodes and logs summary statistics (the mean and std of the returns).

Approximation: no_grad vs. eval

Finally, we clarified the usage of Approximation.eval(*inputs) by adding an additional method, Approximation.no_grad(*inputs). eval() both puts the network in evaluation mode and runs the forward pass with torch.no_grad(). no_grad() simply runs a forward pass in the current mode. The various Policy implementations were also adjusted to correctly execute the greedy behavior in eval mode.

Assets 2

20 Jan 17:51

cpnota

v0.4.0

42f2514

0.4.0

The first public release of the library!

Assets 2

17 Sep 22:18

cpnota

v0.3.3

4575eca

Plots Pre-release

Pre-release

Small but important update!

Added all.experiments.plot module, with plot_returns_100 function that accepts a runs directory and plots contained results.
Tweaked the a2c Atari preset to match the configuration of the other algorithms better

Assets 2

09 Sep 16:02

cpnota

v0.3.1

355773b

C51 Pre-release

Pre-release

Add C51, a distributional RL agent
Add double-dqn agent (ddqn)
UIpdate the Atari wrappers to exactly match deepmind

Assets 2

02 Aug 20:51

cpnota

v0.3.0

e4fbe6e

Unification Pre-release

Pre-release

This release contains several usability enhancements! The biggest change, however, is a refactor. The policy classes now extend from Approximation. This means that things like target networks, learning rate schedulers, and model saving is all handled in one place!

This full list of changes is:

Refactored experiment API (#88)
Policies inherit from Approximation (#89)
Models now save themselves automatically every 200 updates. Also, you can load models and watch them play in each environment! (#90)
Automatically set the temperature in SAC (#91)
Schedule learning rates and other parameters (#92)
SAC bugfix
Refactor usage of target networks. Now there is a difference between eval() and target(): the former runs a forward pass of the current network, the latter does so on the target network, each without creating a computation graph. (#94)
Tweak AdvantageBuffer API. Also fix a minor bug in A2C (#95)
Report the best returns so far in separate metric (#96)

Assets 2

30 Jul 16:22

cpnota

v0.2.4

c9c85ef

SAC Hotfix Pre-release

Pre-release

A bunch in SoftDeterministicPolicy was slowing learning and causing numerical instability in some cases. This fixes that.

Assets 2

23 Jul 20:55

cpnota

v0.2.3

5302042

SAC Pre-release

Pre-release

Added Soft-Actor Critic (SAC). SAC is a state-of-the-art algorithm for continuous control based on the max-entropy RL framework.

Assets 2

20 Jul 17:04

cpnota

v0.2.2

4d63716

PPO + Vanilla Pre-release

Pre-release

PPO and Vanilla release!

Add PPO, one of the most popular modern RL algorithms.
Add Vanilla series agents: "vanilla" implementations of actor-critic, sarsa, q-learning, and REINFORCE. These algorithms are all prefixed with the letter "v" in the agents folder.

Assets 2

12 Jul 19:45

cpnota

v0.2.1

341e525

DDPG Pre-release

Pre-release

This release introduces continuous policies and agents, including DDPG. Also includes a number of quality-of-life improvements:

Add continuous agent suite
Add Gaussian policy
Add DeterministicPolicy
Introduce Approximation base class from which QNetwork, VNetwork, etc. are derived
Convert layers module to all.nn. Extend from torch.nn with custom layers added, to make crafting unique networks easier.
Introduce DDPG agent

Assets 2

07 Jun 22:20

cpnota

v0.2.0

9368f98

act Pre-release

Pre-release

The release contains a bunch of changes under the hood. The agent API was simplified down to a single method, action = agent.act(state, reward). The accompany this change, State was added as a first class object. Terminal states now have the state.mask set to 0, whereas before terminal states were represented by None.

Another major addition is slurm support. This is in particular to aid in running on gypsum. The SlurmExperiment API handles the creation of the appropriate .sh files, output, etc., so experiments can be run on slurm by writing a single python script! No more writing .sh files by hand! Examples can be found in the demos folder.

There were a few other minor changes as well.

Change log:

Simplified agent API to only include act #56
Added State object #51
Added SlurmExperiment for running on gypsum #53
Updated the local and release scripts, and added slurm demos #54
Tweaked parameter order in replay buffers #59
Improved shared feature handling #63
Made write_loss togglable #64
Tweaked default hyperparameters

Assets 2

Releases: cpnota/autonomous-learning-library

Evaluation mode

Agent Evaluation Mode

Experiment Refactoring: Train/Test

Approximation: no_grad vs. eval

Uh oh!

0.4.0

Uh oh!

Plots

Uh oh!

C51

Uh oh!

Unification

Uh oh!

SAC Hotfix

Uh oh!

SAC

Uh oh!

PPO + Vanilla

Uh oh!

DDPG

Uh oh!

act

Uh oh!