The current implementation of CategoricalActor includes some bugs that should be solved. At least I found the following now:
- input to the
tfp.distributions.Categorical is wrong.
- computation of log probability is wrong in
call
There could be some other issues, so needed to evaluated on several discrete environments.
The current implementation of CategoricalActor includes some bugs that should be solved. At least I found the following now:
tfp.distributions.Categoricalis wrong.callThere could be some other issues, so needed to evaluated on several discrete environments.