API Reference - ReinforcementLearningModels

Constructors

new

ReinforcementLearningModels.new{categoricalUpdateFunction: function, diagonalGaussianUpdateFunction: function, episodeUpdateFunction: function}: ReinforcementLearningModel

Parameters:

categoricalUpdateFunction: The update function for categorical actions.
diagonalGaussianUpdateFunction: The update function for diagonal gaussian actions.
episodeUpdateFunction: The episode function for all actions.
resetFunction: The reset function for the reinforcement learning model.

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepQLearning

ReinforcementLearningModels.DeepQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleQLearningV1

ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleQLearningV2

ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepClippedDoubleQLearning

ReinforcementLearningModels.DeepClippedDoubleQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepStateActionRewardStateAction

ReinforcementLearningModels.DeepStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleStateActionRewardStateActionV1

ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleStateActionRewardStateActionV2

ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepExpectedStateActionRewardStateAction

ReinforcementLearningModels.DeepExpectedStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleExpectedStateActionRewardStateActionV1

ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]
WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleExpectedStateActionRewardStateActionV2

ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

MonteCarloControl

ReinforcementLearningModels.MonteCarloControl{Model: function, WeightContainer: WeightContainer, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

OffPolicyMonteCarloControl

ReinforcementLearningModels.OffPolicyMonteCarloControl{Model: function, WeightContainer: WeightContainer, targetPolicyFunction: string, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:
- Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.
- Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.
- StableSoftmax: The more stable option of Softmax (Default)
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

REINFORCE

ReinforcementLearningModels.REINFORCE{Model: function, WeightContainer: WeightContainer, discountFactor:  number}: ReinforcementLearningModel

Parameters:

Model: The model to be used for outputing actions.
WeightContainer: The weight container to be used to update the model’s weight tensors.
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

VanillaPolicyGradient

ReinforcementLearningModels.VanillaPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
AdvantageFunction: The advantage function to update the actor model with.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ActorCritic

ReinforcementLearningModels.ActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

AdvantageActorCritic

ReinforcementLearningModels.AdvantageActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ProximalPolicyOptimization

ReinforcementLearningModels.ProximalPolicyOptimization{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ProximalPolicyOptimizationClip

ReinforcementLearningModels.ProximalPolicyOptimizationClip{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, clipRatio: number, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
clipRatio: A value that controls how far the new policy can get far from old policy. The value must be set between 0 and 1. [Default: 0.3]
lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

SoftActorCritic

ReinforcementLearningModels.SoftActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, alpha: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
alpha: Entropy regularization coefficient. The higher the value, the more the model explores. Generally the value is set between 0 and 1. [Default: 0.1]
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDeterministicPolicyGradient

ReinforcementLearningModels.DeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

TwinDelayedDeepDeterministicPolicyGradient

ReinforcementLearningModels.TwinDelayedDeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, noiseClippingFactor: number, policyDelayAmount: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

ActorModel: The actor model to be used for outputing actions.
ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.
CriticModel: The critic model to be used for outputing critic values.
CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.
noiseClippingFactor: The amount of noise that is allowed in the action noise tensor. [Default: 0.5]
policyDelayAmount: How many times should the actor model wait before updating based on the number of update function calls. [Default: 3]
averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

Functions

categoricalUpdate()

Updates the model parameters using categoricalUpdateFunction().

ReinforcementLearningModels:categoricalUpdate{previousFeatureTensor: tensor, action: number/string, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number}

Parameters:

previousFeatureTensor: The previous state of the environment.
action: The action selected.
rewardValue: The reward gained at current state.
currentFeatureTensor: The current state of the environment.
terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.

diagonalGaussianUpdate()

Updates the model parameters using diagonalGaussianUpdateFunction().

ReinforcementLearningModels:diagonalGaussianUpdate(previousFeatureTensor: tensor, actionNoiseTensor: tensor, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number)

Parameters:

previousFeatureTensor: The previous state of the environment.
actionNoiseTensor: The tensor containing noise values for all actions.
rewardValue: The reward gained at current state.
currentFeatureTensor: The current state of the environment.
terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.

episodeUpdate()

Updates the model parameters using episodeUpdateFunction().

ReinforcementLearningModels:episodeUpdate()

reset()

Reset model’s stored values (excluding the parameters).

ReinforcementLearningModels:reset()