API Reference - Solvers

General Information

Solver Convergence Speed Computational Efficiency (Non-Cached) Computational Efficiency (Cached)* Optimizer Compatibility Properties Best For
Gradient Low High Very High Very High πŸ”° πŸ“ˆ General-Purpose
ConjugateGradient Medium Medium Medium-High High πŸ›‘οΈ πŸ“ˆ Large Datasets + Most Values Are Zero (a.k.a. Sparse Data)
GaussNewton Medium-High Medium Very High Medium πŸ”’ 🎯 Small-Medium Datasets + Well-Defined Problems
LevenbergMarquardt Medium-High Medium Very High Medium πŸ”’ ⚠️ Small-Medium Datasets + Poorly-Defined Problems
IterativelyReweighted Medium Medium Medium Medium πŸ”’ πŸ›‘οΈ Individual Datapoints Have Different Contribution To Feature Weights
GreedyCoordinate Low High Very High Very Low πŸ›‘οΈ πŸ“ˆ Large Datasets + Most Values Are Zero (a.k.a. Sparse Data)
RandomCoordinate Low Very High Very High Very Low πŸ›‘οΈ πŸ“ˆ Extremely Large Datasets
GaussSeidel Low Medium Medium-High Very Low πŸ”’ πŸ’₯ Alternative To Gradient
Jacobi Low Medium-Low Medium Very Low πŸ”’ πŸ’₯ Alternative To Gradient

* Computational efficiency due to cache is only applicable to models that has a constant linear expression as inputs:

  • In other words, anything that depends on nested of functions with their own weights applied as its input (regardless if a non-linear function is applied to the input) will not be cached like neural networks.

  • However, even if non-linear transformation is applied, this model would still benefit from caches like BinaryRegression, PoissonRegression, NegativeBinomialRegression and GammaRegression.

Legend

Icon Name Description
πŸ”° Beginner Solver Commonly taught to beginners.
πŸ”’ Data Constraint The number of data cannot be less than the number of features.
πŸ›‘οΈ Noise Resistant Can handle randomness / unclean data.
🎯 Exact Solution Finds exact optimum (for linear problems).
πŸ“ˆ Scales Well Handles large datasets.
⚠️ Double Regularization Issue Contains a regularization term and may conflict with regularizers.
πŸ’₯ Explosion-Prone Tend to give huge values.

Number Of Cache Operations

m: Number Of Data n: Number Of Features

Solver Number Of Operations Reduced On Cached Remaining Number Of Operations On Non-Cache
Gradient O(mn) O(n^2m)
ConjugateGradient O(n^2m + mn) Huge
GaussNewton O(n^3 + 2(n^2m) + mn) O(n^2m)
LevenbergMarquardt O(n^3 + 2(n^2m) + 2(mn)) O(n^2m)
IterativelyReweighted O(mn) O(n^2m)
GreedyCoordinate O(mn) O(n^2m)
RandomCoordinate O(mn) O(n^2)

Convergence Speed And Convergence Cost Save Analysis (When Cached)

Number Of Iterations Gradient Cost GaussNewton Cost Gradient Converged? GaussNewton Converged? Accumulated Gradient’s Gain Balance
1 O(n^2m + mn) O(n^3 + 2(n^2m) + 2(mn)) No No +O(n^3 + 2(n^2m) + mn)
20 O(mn) O(mn) No Yes +O(n^3 + 2(n^2m) + mn) - 19 * -O(mn) (Convergence Waste)
100 O(mn) O(mn No Yes +O(n^3 + 2(n^2m) + mn) - 99 * -O(mn) (Convergence Waste)
1000 O(mn) O(mn No Yes +O(n^3 + 2(n^2m) + mn) - 999 * -O(mn) (Convergence Waste)

This site uses Just the Docs, a documentation theme for Jekyll.