Conversation
|
This change does not yet pass tests, but is 90% complete. |
|
Can you add a little bit more explanation for the design here? I'm concerned about using an ADT as the blanket input to policies, which makes the interface pretty complicated even in the simplest use cases. |
|
The core motivation here is to provide a way for recurrent and non-recurrent policies to share the same API at optimization time. This PR only adds the bare minimum fields needed for recurrent policies to have reasonable |
| cnn_output = self._cnn_module(observations) | ||
| mlp_output = self._mlp_module(cnn_output)[0] | ||
| logits = torch.softmax(mlp_output, axis=1) | ||
| dist = torch.probability.Categorical(logits=logits) |
There was a problem hiding this comment.
It should be torch.distributions.Categorical ?
089c20f to
017274f
Compare
017274f to
df3a137
Compare
WIP torch optimizer refactor WIP torch optimizer refactor WIP
No description provided.