API Reference - Models - ExpectationMaximization (EM)

ExpectationMaximization is an unsupervised machine learning model that estimates the probability distribution of a dataset and assigns each data point to a cluster based on its most likely probability.

Stored Model Parameters

Contains a table of matrices.

  • ModelParameters[1]: meanMatrix. The rows represent the clusters. The columns represent the features.

  • ModelParameters[2]: varianceMatrix. The rows represent the clusters. The columns represent the features.

  • ModelParameters[3]: piMatrix. The rows represent the clusters.

Constructors

new()

Create new model object. If any of the arguments are nil, default argument values for that argument will be used.

ExpectationMaximization.new(maximumNumberOfIterations: integer, numberOfClusters: integer, mode: string, useLogProbabilities: boolean, distanceFunction: string, epsilon: number): ModelObject

Parameters

  • maximumNumberOfIterations: The maximum number of iterations.

  • numberOfClusters: Number of clusters for model to train and predict on. When using default or set to math.huge(), it will find the best number of clusters using Bayesian information criterion.

  • mode: Controls the mode of the model. Available options are:

    • Hybrid (Default)

    • Online

    • Offline

  • useLogProbabilities: Controls whether or not to convert probabilities using the logarithm function for numerical stability [Default: False].

  • distanceFunction: The distance function to be used to initialize the centroids. Available options are:

    • Euclidean (Default)

    • Manhattan

    • Cosine

  • epsilon: The value to ensure that Gaussian calculation doesn’t reach infinity.

Returns:

  • Model: The generated model object.

Functions

train()

Train the model.

ExpectationMaximization:train(featureMatrix: Matrix)

Parameters:

  • featureMatrix: Matrix containing all data.

Returns:

  • costArray: An array containing cost values.

predict()

Predict which cluster does it belong to for a given data.

ExpectationMaximization:predict(featureMatrix: Matrix, returnOriginalOutput: boolean): Matrix, Matrix -OR- Matrix

Parameters:

  • featureMatrix: Matrix containing data.

  • returnOriginalOutput: Set whether or not to return probabilityMatrix matrix instead of clusterNumberVector and closestDistanceVector.

Returns:

  • clusterNumberVector: A vector containing the cluster that the data belongs to.

  • highestProbabilityVector: The probability (n x 1) matrix of the datapoint belongs to that particular cluster.

-OR-

probabilityMatrix: A matrix containing data-cluster pair probability.

Inherited From

References