Training by both internal and external rewards.

Thanks for your interesting work.
Is it possible to train the intuitor with additional external rewards  (e.g. accuracy reward)?

Have you evaluated the performance when the model is trained by both self-certainty and accuracy reward? I could not find in the paper. But, I may have missed it.

Thanks and appreciate your response.