Beta Version 2.11.0

Added

  • Added these value schedulers under the “ValueSchedulers” section:

    • Chained

    • Constant

    • CosineAnnealing

    • Exponential

    • InverseSquareRoot

    • InverseTime

    • Linear

    • MultipleStep

    • Multiplicative

    • Polynomial

    • Sequential

    • Step

  • Added these optimizers under the “Optimizers” section:

    • ResilientBackwardPropagation
* AdaptiveFactor
* AdaptiveMomentEstimationWeightDecay
* RectifiedAdaptiveMomentEstimation
  • Added these tabular reinforcement learning models under the “Models” section:

    • TabularClippedDoubleQLearning

    • TabularDoubleQLearningV1

    • TabularDoubleQLearningV2

    • TabularDoubleStateActionRewardStateActionV1

    • TabularDoubleStateActionRewardStateActionV2

    • TabularDoubleExpectedStateActionRewardStateActionV1

    • TabularDoubleExpectedStateActionRewardStateActionV2

  • Added OneVsOne under the “Others” section.

  • Added “stateIndex” parameter as the first parameter for BaseEligibilityTrace’s incrementFunction() under the “EligibilityTraces” section.

  • Added “StableSoftmaxSampling” and “StableBoltzmannSampling” options to the CategoricalPolicy under the “QuickSetup” section.

Changes

  • ValueSchedulers can now be used in place of Optimizers for scheduling learning rate.

  • All tabular reinforcement learning models can now be used inside CategoricalPolicy quick setup under the “Models” section.

  • TabularReinforcementLearningBaseModel’s new() constructor now accepts “Optimizer” as one of its parameters under the “Models” section.

  • BaseEligibilityTrace’s new() constructor function now accepts “mode” as one of its parameters under the “EligibilityTraces” section.

  • TabularQLearning, TabularStateActionRewardStateAction and TabularExpectedStateActionRewardStateAction uses EligibilityTrace object instead of “lambda” numerical value under the “Models” section.

  • The calculate() function now accepts ModelParameters as the third parameter for all the optimizers under the “Optimizers” section.

  • Renamed AdaptiveGradientDelta to AdaptiveDelta under the “Optimizers” section.

  • Optimized NeuralNetwork code so that the activation function calculation is not performed for bias values under the “Models” section.

  • The Optimizers’ “internalParameterArray” value are now set to nil instead of an empty table when calling the new() constructor and reset() function under the “Optimizers” section.

  • Made some internal code changes to DeepDoubleQLearningV2, DeepDoubleStateActionRewardStateActionV2 and DeepDoubleExpectedStateActionRewardStateActionV2 models under the “Optimizers” section.

  • Made the default value for “averagingRate” parameter for DeepDoubleQLearningV2, DeepDoubleStateActionRewardStateActionV2 and DeepDoubleExpectedStateActionRewardStateActionV2 models to 0.01 instead of 0.995 under the “Models” section.

Removed

  • Removed TimeDecay and StepDecay from the “ValueSchedulers” section.

  • Removed LearningRateStepDecay and LearningRateTimeDecay from the “Optimizers” section.

  • Removed StringSplitter and Tokenizer from the “Others” section.

Fixes

  • Fixed some bugs in AdaptiveMomentEstimation and NesterovAcceleratedAdaptiveMomentEstimation under the “Optimizers” section.

  • Fixed some bugs where BaseRegularizer and BaseEligibilityTrace returns an “Unknown” class name under the “Regularizers” and “EligibilityTraces” sections.

  • Fixed some bugs in CategoricalPolicy under the “QuickSetups” section.