MindAlpha is a machine learning platform integrating PySpark, PyTorch and a parameter server implementation. The platform contains native support for sparse parameters, making it easy for users to develop large-scale models. Together with MindAlpha Serving, the platform provides a one-stop solution for data preprocessing, model training and online prediction.
-
Efficient IO with PySpark. Minibatches read by PySpark as pandas DataFrames can be feed directly to models.
-
Similar API with PyTorch and Spark MLlib, users familar with PyTorch and PySpark can get started quickly.
-
Wrap custom sparse layers as PyTorch modules, making them easy to use. Those sparse layers can contain billions of parameters.
-
Models can be developed in Jupyter Notebook interactively and periodical model training can be scheduled by Airflow.
-
The trained model can be exported via one method call and loaded by MindAlpha Serving for online prediction.
Firstly, run script to build a docker image
sh run_build.sh -i
For more details, please refer to docker/ubuntu20.04/Dockerfile and docker/centos7/Dockerfile.
and run script to compile sources(*cpp && py) to get dynamic-link library (*.so) and python install packages (*.whl) which will generate at directory build by default.
sh run_build.sh -m
Two tutorials are given:
- MindAlpha Getting Started introduces the basic API of MindAlpha briefly.
- MindAlpha Tutorial shows how to use MindAlpha in the production environment.