Beta Version 1.25.0
Added
-
Added ProximalPolicyOptimizationClip model.
-
Added ReinforcementLearningActorCriticNeuralNetworkBaseModel model.
Changes
-
Refactored the codes for ProximalPolicyOptimization, VanillaPolicyGradient, ActorCritic and AdvantageActorCritic so that it inherits from ReinforcementLearningActorCriticNeuralNetworkBaseModel.
-
Episode updates now runs at the end of final reinforcement at the same episode. Previously it runs before the first reinforcement at the next episode.