API Reference - ReinforcementLearningModels

Constructors

new


ReinforcementLearningModels.new{categoricalUpdateFunction: function, diagonalGaussianUpdateFunction: function, episodeUpdateFunction: function}: ReinforcementLearningModel

Parameters:

  • categoricalUpdateFunction: The update function for categorical actions.

  • diagonalGaussianUpdateFunction: The update function for diagonal gaussian actions.

  • episodeUpdateFunction: The episode function for all actions.

  • resetFunction: The reset function for the reinforcement learning model.

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepQLearning


ReinforcementLearningModels.DeepQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleQLearningV1


ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

  • WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleQLearningV2


ReinforcementLearningModels.DeepDoubleQLearningV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepClippedDoubleQLearning


ReinforcementLearningModels.DeepClippedDoubleQLearning{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

  • WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepStateActionRewardStateAction


ReinforcementLearningModels.DeepStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleStateActionRewardStateActionV1


ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

  • WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleStateActionRewardStateActionV2


ReinforcementLearningModels.DeepDoubleStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepExpectedStateActionRewardStateAction


ReinforcementLearningModels.DeepExpectedStateActionRewardStateAction{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleExpectedStateActionRewardStateActionV1


ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV1{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

  • WeightTensorArrayArray: An array containing two weight tensor array.

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDoubleExpectedStateActionRewardStateActionV2


ReinforcementLearningModels.DeepDoubleExpectedStateActionRewardStateActionV2{Model: function, WeightContainer: WeightContainer, epsilon: number, lambda: number, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • epsilon: Controls the balance between exploration and exploitation for calculating expected Q-values. The value must be set between 0 and 1. The value 0 focuses on exploitation only and 1 focuses on exploration only. [Default: 0.5]

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

MonteCarloControl


ReinforcementLearningModels.MonteCarloControl{Model: function, WeightContainer: WeightContainer, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

OffPolicyMonteCarloControl


ReinforcementLearningModels.OffPolicyMonteCarloControl{Model: function, WeightContainer: WeightContainer, targetPolicyFunction: string, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:

    • Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.

    • Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.

    • StableSoftmax: The more stable option of Softmax (Default)

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

REINFORCE


ReinforcementLearningModels.REINFORCE{Model: function, WeightContainer: WeightContainer, discountFactor:  number}: ReinforcementLearningModel

Parameters:

  • Model: The model to be used for outputing actions.

  • WeightContainer: The weight container to be used to update the model’s weight tensors.

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

VanillaPolicyGradient


ReinforcementLearningModels.VanillaPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • AdvantageFunction: The advantage function to update the actor model with.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ActorCritic


ReinforcementLearningModels.ActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

AdvantageActorCritic


ReinforcementLearningModels.AdvantageActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, AdvantageFunction: function, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ProximalPolicyOptimization


ReinforcementLearningModels.ProximalPolicyOptimization{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

ProximalPolicyOptimizationClip


ReinforcementLearningModels.ProximalPolicyOptimizationClip{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, clipRatio: number, lambda: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • clipRatio: A value that controls how far the new policy can get far from old policy. The value must be set between 0 and 1. [Default: 0.3]

  • lambda: At 0, the model acts like the Temporal Difference algorithm. At 1, the model acts as Monte Carlo algorithm. Between 0 and 1, the model acts as both. [Default: 0]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

SoftActorCritic


ReinforcementLearningModels.SoftActorCritic{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, alpha: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • alpha: Entropy regularization coefficient. The higher the value, the more the model explores. Generally the value is set between 0 and 1. [Default: 0.1]

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

DeepDeterministicPolicyGradient


ReinforcementLearningModels.DeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

TwinDelayedDeepDeterministicPolicyGradient


ReinforcementLearningModels.TwinDelayedDeepDeterministicPolicyGradient{ActorModel: function, ActorWeightContainer: WeightContainer, CriticModel: function, CriticWeightContainer: WeightContainer, noiseClippingFactor: number, policyDelayAmount: number, averagingRate: number, discountFactor: number}: ReinforcementLearningModel

Parameters:

  • ActorModel: The actor model to be used for outputing actions.

  • ActorWeightContainer: The weight container to be used to update the actor model’s weight tensors.

  • CriticModel: The critic model to be used for outputing critic values.

  • CriticWeightContainer: The weight container to be used to update the critic model’s weight tensors.

  • noiseClippingFactor: The amount of noise that is allowed in the action noise tensor. [Default: 0.5]

  • policyDelayAmount: How many times should the actor model wait before updating based on the number of update function calls. [Default: 3]

  • averagingRate: The higher the value, the faster the weights changes. The value must be set between 0 and 1. [Default: 0.995]

  • discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Returns:

  • ReinforcementLearningModel: The reinforcement learning model that is generated by the constructor.

Functions

categoricalUpdate()

Updates the model parameters using categoricalUpdateFunction().

ReinforcementLearningModels:categoricalUpdate{previousFeatureTensor: tensor, action: number/string, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number}

Parameters:

  • previousFeatureTensor: The previous state of the environment.

  • action: The action selected.

  • rewardValue: The reward gained at current state.

  • currentFeatureTensor: The current state of the environment.

  • terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.

diagonalGaussianUpdate()

Updates the model parameters using diagonalGaussianUpdateFunction().

ReinforcementLearningModels:diagonalGaussianUpdate(previousFeatureTensor: tensor, actionNoiseTensor: tensor, rewardValue: number, currentFeatureTensor: tensor, terminalStateValue: number)

Parameters:

  • previousFeatureTensor: The previous state of the environment.

  • actionNoiseTensor: The tensor containing noise values for all actions.

  • rewardValue: The reward gained at current state.

  • currentFeatureTensor: The current state of the environment.

  • terminalStateValue: A value of 1 indicates that the current state is a terminal state. A value of 0 indicates that the current state is not terminal.

episodeUpdate()

Updates the model parameters using episodeUpdateFunction().

ReinforcementLearningModels:episodeUpdate()

reset()

Reset model’s stored values (excluding the parameters).

ReinforcementLearningModels:reset()