You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+31-23Lines changed: 31 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,54 +2,62 @@
2
2
3
3
The Autonomous Learning Library (`all`) is an object-oriented deep reinforcement learning library in `pytorch`. The goal of the library is to provide implementations of modern reinforcement learning algorithms that reflect the way that reinforcement learning researchers think about agent design and to provide the components necessary to build and test new ideas with minimal overhead.
4
4
5
-
## Algorithms
6
-
7
-
As of today, `all` contains implementations of the following deep RL algorithms:
8
-
9
-
-[x] Advantage Actor-Critic (A2C)
10
-
-[x] Categorical DQN (C51)
11
-
-[x] Deep Deterministic Policy Gradient (DDPG)
12
-
-[x] Deep Q-Learning (DQN) + extensions
13
-
-[x] Proximal Policy Optimization (PPO)
14
-
-[x] Rainbow (Rainbow)
15
-
-[x] Soft Actor-Critic (SAC)
16
-
17
-
It also contains implementations of the following "vanilla" agents, which provide useful baselines and perform better than you may expect:
18
-
19
-
-[x] Vanilla Actor-Critic
20
-
-[x] Vanilla Policy Gradient
21
-
-[x] Vanilla Q-Learning
22
-
-[x] Vanilla Sarsa
23
-
24
-
We will try to stay up-to-date with advances in the field, but we do not intend to implement every algorithm. Rather, we prefer to maintain a smaller set of high-quality agents that have achieved notoriety in the field.
25
-
26
5
## Why use `all`?
27
6
28
7
The primary reason for using `all` over its many competitors is because it contains components that allow you to *build your own* reinforcement learning agents.
29
8
We provide out-of-the-box modules for:
30
9
31
10
-[x] Custom Q-Networks, V-Networks, policy networks, and feature networks
32
11
-[x] Generic function approximation
12
+
-[x] Target networks
13
+
-[x] Polyak averaging
33
14
-[x] Experience Replay
34
15
-[x] Prioritized Experience Replay
35
16
-[x] Advantage Estimation
36
17
-[x] Generalized Advantage Estimation (GAE)
37
-
-[x] Target networks
38
-
-[x] Polyak averaging
39
18
-[x] Easy parameter and learning rate scheduling
40
19
-[x] An enhanced `nn` module (includes dueling layers, noisy layers, action bounds, and the coveted `nn.Flatten`)
41
20
-[x]`gym` to `pytorch` wrappers
42
21
-[x] Atari wrappers
43
22
-[x] An `Experiment` API for comparing and evaluating agents
44
23
-[x] A `SlurmExperiment` API for running massive experiments on computing clusters
45
24
-[x] A `Writer` object for easily logging information in `tensorboard`
25
+
-[x] Plotting utilities for generating paper-worthy result plots
46
26
47
27
Rather than being embedded in the agents, all of these modules are available for use by your own custom agents.
48
28
Additionally, the included agents accept custom versions of any of the above objects.
49
29
Have a new type of replay buffer in mind?
50
30
Code it up and pass it directly to our `DQN` and `DDPG` implementations.
51
31
Additionally, our agents were written with readibility as a primary concern, so they are easy to modify.
52
32
33
+
## Algorithms
34
+
35
+
As of today, `all` contains implementations of the following deep RL algorithms:
36
+
37
+
-[x] Advantage Actor-Critic (A2C)
38
+
-[x] Categorical DQN (C51)
39
+
-[x] Deep Deterministic Policy Gradient (DDPG)
40
+
-[x] Deep Q-Learning (DQN) + extensions
41
+
-[x] Proximal Policy Optimization (PPO)
42
+
-[x] Rainbow (Rainbow)
43
+
-[x] Soft Actor-Critic (SAC)
44
+
45
+
It also contains implementations of the following "vanilla" agents, which provide useful baselines and perform better than you may expect:
46
+
47
+
-[x] Vanilla Actor-Critic
48
+
-[x] Vanilla Policy Gradient
49
+
-[x] Vanilla Q-Learning
50
+
-[x] Vanilla Sarsa
51
+
52
+
We will try to stay up-to-date with advances in the field, but we do not intend to implement every algorithm. Rather, we prefer to maintain a smaller set of high-quality agents that have achieved notoriety in the field.
53
+
54
+
We have labored to make sure that our implementations produce results comparable to published results.
55
+
Here's a sampling of performance on several Atari games:
56
+
57
+

58
+
59
+
These results were generated using the `all.presets.atari` module, the `SlurmExperiment` utility, and the `all.experiments.plots` module.
60
+
53
61
## Example
54
62
55
63
Our agents implement a single method: `action = agent.act(state, reward)`.
0 commit comments