TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
TorchX currently supports:
- Kubernetes (EKS, GKE, AKS, etc)
- Slurm
- AWS Batch
- Docker
- Local
- Ray (prototype)
Need a scheduler not listed? Let us know!
See the quickstart guide.
torchx:
Certain schedulers may require scheduler specific requirements. See installation for info.
# install torchx sdk and CLI -- minimum dependencies
pip install torchx
# install torchx sdk and CLI -- all dependencies
pip install "torchx[dev]"
# install torchx kubeflow pipelines (kfp) support
pip install "torchx[kfp]"
# install torchx Kubernetes / Volcano support
pip install "torchx[kubernetes]"
# install torchx Ray support
pip install "torchx[ray]"
# install torchx sdk and CLI
pip install torchx-nightly[dev]
# install torchx sdk and CLI from source
$ pip install -e git https://github.com/pytorch/torchx.git#egg=torchx
# install extra dependencies
$ pip install -e git https://github.com/pytorch/torchx.git#egg=torchx[dev]
TorchX provides a docker container for using as as part of a TorchX role.
See: https://github.com/pytorch/torchx/pkgs/container/torchx
We welcome PRs! See the CONTRIBUTING file.
TorchX is BSD licensed, as found in the LICENSE file.