AsynchronousAdvantageCritic is a base class for reinforcement learning.
The Actor and Critic child models must be created separately. Then use addActorCriticModel() to put it inside the AsynchronousAdvantageActorCritic model.
Actor and Critic models must be a part of NeuralNetwork model. If you decide to use linear regression or logistic regression, then it must be constructed using NeuralNetwork model.
Ensure the final layer of the Critic model has only one neuron. It is the default setting for all Critic models in research papers.
Ensure that setActorCriticMainModelParameters() is called first so that other child models can duplicate the main model parameters. Otherwise, the main model parameters will be selected randomly from the child models.
Create new model object. If any of the arguments are nil, default argument values for that argument will be used.
AsynchronousAdvantageCritic.new(learningRate: integer, numberOfReinforcementsPerEpisode: integer, epsilon: number, epsilonDecayFactor: number, discountFactor: number, totalNumberOfReinforcementsToUpdateMainModel: number, actionSelectionFunction: string): ModelObject
learningRate: The speed at which the model learns. Recommended that the value is set between (0 to 1).
numberOfReinforcementsPerEpisode: The number of reinforcements to decay the epsilon value. It will be also used for actor and critic loss calculations.
epsilon: The higher the value, the more likely it focuses on exploration over exploitation. The value must be set between 0 and 1.
epsilonDecayFactor: The higher the value, the slower the epsilon decays. The value must be set between 0 and 1.
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1.
totalNumberOfReinforcementsToUpdateMainModel: The required total number of reinforce() function call from all child models to update the main model.
actionSelectionFunction: The function on how to choose an action. Available options are:
Maximum (Default)
Sample
Set model’s parameters. When any of the arguments are nil, previous argument values for that argument will be used.
AsynchronousAdvantageCritic:setParameters(learningRate: integer, numberOfReinforcementsPerEpisode: integer, epsilon: number, epsilonDecayFactor: number, discountFactor: number, totalNumberOfReinforcementsToUpdateMainModel: number, actionSelectionFunction: string))
learningRate: The speed at which the model learns. Recommended that the value is set between (0 to 1).
numberOfReinforcementsPerEpisode: The number of reinforcements to decay the epsilon value. It will be also used for actor and critic loss calculations.
epsilon: The higher the value, the more likely it focuses on exploration over exploitation. The value must be set between 0 and 1.
epsilonDecayFactor: The higher the value, the slower the epsilon decays. The value must be set between 0 and 1.
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1.
totalNumberOfReinforcementsToUpdateMainModel: The required total number of reinforce() function call from all child models to update the main model.
actionSelectionFunction: The function on how to choose an action. Available options are:
Maximum
Sample
AsynchronousAdvantageCritic:addActorCriticModel(ActorModel: ModelObject, CriticModel: ModelObject, ExperienceReplay: ExperienceReplayObject)
ActorModel: The model to be used as an Actor model.
CriticModel: The model to be used as a Critic model.
ExperienceReplay: The experience replay object.
AsynchronousAdvantageCritic:setClassesList(classesList: [])
AsynchronousAdvantageCritic:setActorCriticMainModelParameters(ActorMainModelParameters: [], CriticMainModelParameters[], applyToAllChildModels: boolean)
ActorMainModelParameters: The model parameters to be set for main actor model.
CriticMainModelParameters: The model parameters to be set for main critic model.
applyToAllChildModels: Set whether or not the main model parameters will be applied to all child models in the main model.
AsynchronousAdvantageCritic:getActorCriticMainModelParameters(): [], []
ActorMainModelParameters: The model parameters from the main actor model.
CriticMainModelParameters: The model parameters from the main critic model.
Reward or punish model based on the current state of the environment.
AsynchronousAdvantageCritic:reinforce(currentFeatureVector: matrix, actionStandardDeviationVector: matrix, rewardValue: number, returnOriginalOutput: boolean, actorCriticModelNumber: number): integer, number -OR- Matrix
currentFeatureVector: Matrix containing data from the current state.
actionStandardDeviationVector: The vector containing values of action’s standard deviations. The number of columns must match the number of actions.
rewardValue: The reward value added/subtracted from the current state (recommended value between -1 and 1, but can be larger than these values).
returnOriginalOutput: Set whether or not to return predicted vector instead of value with highest probability.
actorCriticModelNumber: The model number for a model to be reinforced.
predictedLabel: A label that is predicted by the model.
value: The value of predicted label.
-OR-
Updates the model parameters based on diagonal Gaussian distribution for continuous action spaces.
AsynchronousAdvantageCritic:categoricalUpdate(previousFeatureVector: featureVector, action: number/string, rewardValue: number, currentFeatureVector: featureVector, actorCriticModelNumber: number)
previousFeatiureVector: The previous state of the environment.
action: The action selected.
rewardValue: The reward gained at current state.
currentFeatureVector: The currrent state of the environment.
actorCriticModelNumber: The model number for a model to update the parameters.
Updates the model parameters based on categorical distribution for discrete action spaces.
AsynchronousAdvantageCritic:diagonalGaussianUpdate(previousFeatureVector: featureVector, actionVector: vector, rewardValue: number, currentFeatureVector: featureVector, actorCriticModelNumber: number)
previousFeatiureVector: The previous state of the environment.
actionVector: The action vector generated by the model.
rewardValue: The reward gained at current state.
currentFeatureVector: The currrent state of the environment.
actorCriticModelNumber: The model number for a model to update the parameters.
AsynchronousAdvantageCritic:getCurrentNumberOfEpisodes(actorCriticModelNumber: number): number
AsynchronousAdvantageCritic:getCurrentNumberOfReinforcements(actorCriticModelNumber: number): number
AsynchronousAdvantageCritic:getCurrentEpsilon(actorCriticModelNumber: number): number
AsynchronousAdvantageCritic:getCurrentTotalNumberOfReinforcementsToUpdateMainModel(): number
Reset a single child model’s stored values (excluding the parameters).
AsynchronousAdvantageCritic:reset(actorCriticModelNumber)
Reset the main model’s and child models’ stored values (excluding the parameters).
AsynchronousAdvantageCritic:resetAll()
Destroys the model object.
AsynchronousAdvantageCritic:destroy()