ML Spark

Machine learning algorithms implemented in Scala on Spark

Currently 4 models are included:

Gaussian Naive Bayes – Naive Bayes classifier for continuous features. Assumes likelihoods follow Gaussian distribution P(x_i | y) = (1/sqrt(2 * pi * sigma_y^2)) * exp(-((x_i - mu_y)^2)/2 * pi * sigma_y^2). The posterior distribution for each class is estimated by summing the exponential of all likelihoods and for a given class and class prior probability.
K Means – Performs k-means clustering on data samples labeled by class. The distance function distMeasure may be specified as either euclidean (default) or cosine. Distance functions are passed internally as partially defined functions for extensibility. Both the means and standard deviations are calculated and recorded for each cluster - useful for generating radial basis functions based on distance from clusters.
Logistic Regression – Binary logistic regression classifier with L2 normalization. Loss function is minimized with gradient descent
Softmax Logistic Regression – Multi-class logistic regression with optional regularizations: L1, L1 (with clipping), L2, none (default). Regularization gradient update functions are specified and passed as partials for extensibility.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
project		project
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback