API Reference - ReinforcementLearningModels
Constructors
new
ReinforcementLearningModels.new{categoricalUpdateFunction: function, diagonalGaussianUpdateFunction: function, episodeUpdateFunction: function}: ReinforcementLearningModel
Parameters:
-
categoricalUpdateFunction: The update function for categorical actions.
-
diagonalGaussianUpdateFunction: The update function for diagonal gaussian actions.
-
episodeUpdateFunction: The episode function for all actions.
-
resetFunction: The reset function for the reinforcement learning model.
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepQLearning
ReinforcementLearningModels.DeepQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleQLearningV1
ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
-
WeightTensorArrayArray: An array containing two weight tensor array.
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleQLearningV2
ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepClippedDoubleQLearning
ReinforcementLearningModels.DeepClippedDoubleQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
-
WeightTensorArrayArray: An array containing two weight tensor array.
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepStateActionRewardStateAction
ReinforcementLearningModels.DeepStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleStateActionRewardStateActionV1
ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
-
WeightTensorArrayArray: An array containing two weight tensor array.
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleStateActionRewardStateActionV2
ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepExpectedStateActionRewardStateAction
ReinforcementLearningModels.DeepExpectedStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleExpectedStateActionRewardStateActionV1
ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
-
WeightTensorArrayArray: An array containing two weight tensor array.
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDoubleExpectedStateActionRewardStateActionV2
ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
MonteCarloControl
ReinforcementLearningModels.MonteCarloControl{Model: function, WeightContainer: WeightContainer, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
OffPolicyMonteCarloControl
ReinforcementLearningModels.OffPolicyMonteCarloControl{Model: function, WeightContainer: WeightContainer, targetPolicyFunction: string, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:
-
Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.
-
Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.
-
StableSoftmax: The more stable option of Softmax (Default)
-
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
REINFORCE
ReinforcementLearningModels.REINFORCE{Model: function, WeightContainer: WeightContainer, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
Model: The model to be used for outputing actions.
-
WeightContainer: The weight container to be used to update the model’s weight tensors.
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
VanillaPolicyGradient
ReinforcementLearningModels.VanillaPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
AdvantageFunction: The advantage function to update the actor model with.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
ActorCritic
ReinforcementLearningModels.ActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
AdvantageActorCritic
ReinforcementLearningModels.AdvantageActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
ProximalPolicyOptimization
ReinforcementLearningModels.ProximalPolicyOptimization{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
ProximalPolicyOptimizationClip
ReinforcementLearningModels.ProximalPolicyOptimizationClip{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, clipRatio: number, lambda: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
clipRatio: A value that controls how far the new policy can get far from old policy. The value must be set between 0 and 1. [Default: 0.3]
-
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
SoftActorCritic
ReinforcementLearningModels.SoftActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, alpha: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
alpha: Entropy regularization coefficient. The higher the value, the more the model explores. Generally the value is set between 0 and 1. [Default: 0.1]
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
DeepDeterministicPolicyGradient
ReinforcementLearningModels.DeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, averagingRate: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
TwinDelayedDeepDeterministicPolicyGradient
ReinforcementLearningModels.TwinDelayedDeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, noiseClippingFactor: number, policyDelayAmount: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel
Parameters:
-
ActorModel: The actor model to be used for outputing actions.
-
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
-
CriticModel: The critic model to be used for outputing critic values.
-
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
-
noiseClippingFactor: The amount of noise that is allowed in the action noise tensor. [Default: 0.5]
-
policyDelayAmount: How many times should the actor model wait before updating based on the number of update function calls. [Default: 3]
-
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
-
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
Returns:
- ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.
Functions
categoricalUpdate()
Updates the model parameters using categoricalUpdateFunction().
ReinforcementLearningModels:categoricalUpdate{previousFeatureTensor: tensor, action: number/string, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number}
Parameters:
-
previousFeatureTensor: The previous state of the environment.
-
action: The action selected.
-
rewardValue: The reward gained at current state.
-
currentFeatureTensor: The current state of the environment.
-
terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.
diagonalGaussianUpdate()
Updates the model parameters using diagonalGaussianUpdateFunction().
ReinforcementLearningModels:diagonalGaussianUpdate(previousFeatureTensor: tensor, actionNoiseTensor: tensor, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number)
Parameters:
-
previousFeatureTensor: The previous state of the environment.
-
actionNoiseTensor: The tensor containing noise values for all actions.
-
rewardValue: The reward gained at current state.
-
currentFeatureTensor: The current state of the environment.
-
terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.
episodeUpdate()
Updates the model parameters using episodeUpdateFunction().
ReinforcementLearningModels:episodeUpdate()
reset()
Reset model’s stored values (excluding the parameters).
ReinforcementLearningModels:reset()