API Reference - Solvers
General Information
| Solver | Convergence Speed | Computational Efficiency (Non-Cached) | Computational Efficiency (Cached)* | Optimizer Compatibility | Properties | Best For |
|---|---|---|---|---|---|---|
| Gradient | Low | High | Very High | Very High | π° π | General-Purpose |
| ConjugateGradient | Medium | Medium | Medium-High | High | π‘οΈ π | Large Datasets + Most Values Are Zero (a.k.a. Sparse Data) |
| GaussNewton | Medium-High | Medium | Very High | Medium | π’ π― | Small-Medium Datasets + Well-Defined Problems |
| LevenbergMarquardt | Medium-High | Medium | Very High | Medium | π’ β οΈ | Small-Medium Datasets + Poorly-Defined Problems |
| IterativelyReweighted | Medium | Medium | Medium | Medium | π’ π‘οΈ | Individual Datapoints Have Different Contribution To Feature Weights |
| GreedyCoordinate | Low | High | Very High | Very Low | π‘οΈ π | Large Datasets + Most Values Are Zero (a.k.a. Sparse Data) |
| RandomCoordinate | Low | Very High | Very High | Very Low | π‘οΈ π | Extremely Large Datasets |
| GaussSeidel | Low | Medium | Medium-High | Very Low | π’ π₯ | Alternative To Gradient |
| Jacobi | Low | Medium-Low | Medium | Very Low | π’ π₯ | Alternative To Gradient |
* Computational efficiency due to cache is only applicable to models that has a constant linear expression as inputs:
-
In other words, anything that depends on nested of functions with their own weights applied as its input (regardless if a non-linear function is applied to the input) will not be cached like neural networks.
-
However, even if non-linear transformation is applied, this model would still benefit from caches like BinaryRegression, PoissonRegression, NegativeBinomialRegression and GammaRegression.
Legend
| Icon | Name | Description |
|---|---|---|
| π° | Beginner Solver | Commonly taught to beginners. |
| π’ | Data Constraint | The number of data cannot be less than the number of features. |
| π‘οΈ | Noise Resistant | Can handle randomness / unclean data. |
| π― | Exact Solution | Finds exact optimum (for linear problems). |
| π | Scales Well | Handles large datasets. |
| β οΈ | Double Regularization Issue | Contains a regularization term and may conflict with regularizers. |
| π₯ | Explosion-Prone | Tend to give huge values. |
Number Of Cache Operations
m: Number Of Data n: Number Of Features
| Solver | Number Of Operations Reduced On Cached | Remaining Number Of Operations On Non-Cache |
|---|---|---|
| Gradient | O(mn) | O(n^2m) |
| ConjugateGradient | O(n^2m + mn) | Huge |
| GaussNewton | O(n^3 + 2(n^2m) + mn) | O(n^2m) |
| LevenbergMarquardt | O(n^3 + 2(n^2m) + 2(mn)) | O(n^2m) |
| IterativelyReweighted | O(mn) | O(n^2m) |
| GreedyCoordinate | O(mn) | O(n^2m) |
| RandomCoordinate | O(mn) | O(n^2) |
Convergence Speed And Convergence Cost Save Analysis (When Cached)
| Number Of Iterations | Gradient Cost | GaussNewton Cost | Gradient Converged? | GaussNewton Converged? | Accumulated Gradientβs Gain Balance |
|---|---|---|---|---|---|
| 1 | O(n^2m + mn) | O(n^3 + 2(n^2m) + 2(mn)) | No | No | +O(n^3 + 2(n^2m) + mn) |
| 20 | O(mn) | O(mn) | No | Yes | +O(n^3 + 2(n^2m) + mn) - 19 * -O(mn) (Convergence Waste) |
| 100 | O(mn) | O(mn | No | Yes | +O(n^3 + 2(n^2m) + mn) - 99 * -O(mn) (Convergence Waste) |
| 1000 | O(mn) | O(mn | No | Yes | +O(n^3 + 2(n^2m) + mn) - 999 * -O(mn) (Convergence Waste) |