API Reference - Models - OffPolicyMonteCarloControl

OffPolicyMonteCarloControl is a neural network with reinforcement learning capabilities. It can predict any positive numbers of discrete values.

Constructors

Create new model object. If any of the arguments are nil, default argument values for that argument will be used.

OffPolicyMonteCarloControl.new(targetPolicyFunction: string, discountFactor: number): ModelObject

targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:
- Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.
- Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.
- StableSoftmax: The more stable option of Softmax (Default)
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. [Default: 0.95]

Set model’s parameters. When any of the arguments are nil, previous argument values for that argument will be used.

OffPolicyMonteCarloControl:setParameters(targetPolicyFunction: string, discountFactor: number)

targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:
- Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.
- Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.
- StableSoftmax: The more stable option of Softmax (Default)
discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1.