mxnet

MXNet

MXNet is a deep learning platform that accelerates the transition from research prototyping to production deployment. It is built for full integration into Python that enables you to use it with its libraries and main packages.

TL;DR

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/mxnet

Introduction

This chart bootstraps a MXNet deployment on a Kubernetes cluster using the Helm package manager.

Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of Bitnami Kubernetes Production Runtime (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.

Prerequisites

Kubernetes 1.12
Helm 2.12 or Helm 3.0-beta3
PV provisioner support in the underlying infrastructure
ReadWriteMany volumes for deployment scaling

Installing the Chart

To install the chart with the release name my-release:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/mxnet

These commands deploy MXNet on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured.

Tip: List all releases using helm list

Uninstalling the Chart

To uninstall/delete the my-release deployment:

$ helm delete my-release

The command removes all the Kubernetes components associated with the chart and deletes the release.

Parameters

The following table lists the configurable parameters of the MinIO chart and their default values.

Parameter	Description	Default
`global.imageRegistry`	Global Docker image registry	`nil`
`global.imagePullSecrets`	Global Docker registry secret names as an array	`[]` (does not add image pull secrets to deployed pods)
`global.storageClass`	Global storage class for dynamic provisioning	`nil`
`image.registry`	MXNet image registry	`docker.io`
`image.repository`	MXNet image name	`bitnami/MXNet`
`image.tag`	MXNet image tag	`{TAG_NAME}`
`image.pullPolicy`	Image pull policy	`IfNotPresent`
`image.pullSecrets`	Specify docker-registry secret names as an array	`[]` (does not add image pull secrets to deployed pods)
`image.debug`	Specify if debug logs should be enabled	`false`
`git.registry`	Git image registry	`docker.io`
`git.repository`	Git image name	`bitnami/git`
`git.tag`	Git image tag	`{TAG_NAME}`
`git.pullPolicy`	Git image pull policy	`IfNotPresent`
`git.pullSecrets`	Specify docker-registry secret names as an array	`[]` (does not add image pull secrets to deployed pods)
`nameOverride`	String to partially override mxnet.fullname template with a string (will prepend the release name)	`nil`
`fullnameOverride`	String to fully override mxnet.fullname template with a string	`nil`
`volumePermissions.enabled`	Enable init container that changes volume permissions in the data directory (for cases where the default k8s `runAsUser` and `fsUser` values do not work)	`false`
`volumePermissions.image.registry`	Init container volume-permissions image registry	`docker.io`
`volumePermissions.image.repository`	Init container volume-permissions image name	`bitnami/minideb`
`volumePermissions.image.tag`	Init container volume-permissions image tag	`buster`
`volumePermissions.image.pullPolicy`	Init container volume-permissions image pull policy	`Always`
`volumePermissions.resources`	Init container resource requests/limit	`nil`
`service.type`	Kubernetes service type	`ClusterIP`
`entrypoint.file`	Main entrypoint to your application. If not speficied, it will be a `sleep infinity` command	`''`
`entrypoint.args`	Args required by your entrypoint	`nil`
`entrypoint.workDir`	Working directory for launching the entrypoint	`'/app'`
`podManagementPolicy`	StatefulSet (worker and server nodes) pod management policy	`Parallel`
`mode`	Run MXNet in standalone or distributed mode (possible values: `standalone`, `distributed`)	`standalone`
`serverCount`	Number of server nodes that will execute your code	`1`
`workerCount`	Number of worker nodes that will execute your code	`1`
`schedulerPort`	MXNet scheduler port (only for distributed mode)	`49875`
`configMap`	Config map that contains the files you want to load in MXNet	`nil`
`cloneFilesFromGit.enabled`	Enable in order to download files from git repository	`false`
`cloneFilesFromGit.repository`	Repository that holds the files	`nil`
`cloneFilesFromGit.revision`	Revision from the repository to checkout	`master`
`commonExtraEnvVars`	Extra environment variables to add to server, scheduler and worker nodes	`nil`
`workerExtraEnvVars`	Extra environment variables to add to the worker nodes	`nil`
`serverExtraEnvVars`	Extra environment variables to add to the server nodes	`nil`
`schedulerExtraEnvVars`	Extra environment variables to add to the scheduler node	`nil`
`existingSecret`	Name of a secret with sensitive data to mount in the pods	`nil`
`nodeSelector`	Node labels for pod assignment (this value is evaluated as a template)	`{}`
`tolerations`	Toleration labels for pod assignment (this value is evaluated as a template)	`[]`
`affinity`	Map of node/pod affinities (this value is evaluated as a template)	`{}`
`resources`	Pod resources	`{}`
`securityContext.enabled`	Enable security context	`true`
`securityContext.fsGroup`	Group ID for the container	`1001`
`securityContext.runAsUser`	User ID for the container	`1001`
`livenessProbe.enabled`	Enable/disable the Liveness probe	`true`
`livenessProbe.initialDelaySeconds`	Delay before liveness probe is initiated	`5`
`livenessProbe.periodSeconds`	How often to perform the probe	`5`
`livenessProbe.timeoutSeconds`	When the probe times out	`5`
`livenessProbe.successThreshold`	Minimum consecutive successes for the probe to be considered successful after having failed.	`1`
`livenessProbe.failureThreshold`	Minimum consecutive failures for the probe to be considered failed after having succeeded.	`5`
`readinessProbe.enabled`	Enable/disable the Readiness probe	`true`
`readinessProbe.initialDelaySeconds`	Delay before readiness probe is initiated	`5`
`readinessProbe.periodSeconds`	How often to perform the probe	`5`
`readinessProbe.timeoutSeconds`	When the probe times out	`1`
`readinessProbe.successThreshold`	Minimum consecutive successes for the probe to be considered successful after having failed.	`1`
`readinessProbe.failureThreshold`	Minimum consecutive failures for the probe to be considered failed after having succeeded.	`5`
`persistence.enabled`	Use a PVC to persist data	`false`
`persistence.mountPath`	Path to mount the volume at	`/bitnami/mxnet`
`persistence.storageClass`	Storage class of backing PVC	`nil` (uses alpha storage class annotation)
`persistence.accessMode`	Use volume as ReadOnly or ReadWrite	`ReadWriteOnce`
`persistence.size`	Size of data volume	`8Gi`
`persistence.annotations`	Persistent Volume annotations	`{}`
`sidecars`	Attach additional containers to the pods (scheduler, worker and server nodes)	`[]`
`initContainers`	Attach additional init containers to the pods (scheduler, worker and server nodes)	`[]`

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,

$ helm install my-release \
  --set mode=distributed \
  --set serverCount=2 \
  --set workerCount=3 \
    bitnami/mxnet

The above command creates 6 pods for MXNet: one scheduler, two servers, and three workers.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install my-release -f values.yaml bitnami/mxnet

Tip: You can use the default values.yaml

Configuration and installation details

Rolling VS Immutable tags

It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.

Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.

Production configuration

This chart includes a values-production.yaml file where you can find some parameters oriented to production configuration in comparison to the regular values.yaml. You can use this file instead of the default one.

Run MXNet in distributed mode:

- mode: standalone
  mode: distributed

Number of server nodes that will execute your code:

- serverCount: 1
  serverCount: 2

Number of worker nodes that will execute your code:

- workerCount: 1
  workerCount: 4

Loading your files

The MXNet chart supports three different ways to load your files. In order of priority, they are:

Existing config map
Files under the files directory
Cloning a git repository

This means that if you specify a config map with your files, it won't look for the files/ directory nor the git repository.

In order to use use an existing config map you can set the configMap=my-config-map parameter.

To load your files from the files/ directory you don't have to set any option. Just copy your files inside and don't specify a ConfigMap.

Finally, if you want to clone a git repository you can use the following parameters:

cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/my-user/my-repo
cloneFilesFromGit.revision=master

In case you want to add a file that includes sensitive information, pass a secret object using the existingSecret parameter. All the files in the secret will be mounted in the /secrets folder.

Distributed training example

We will use the gluon example from the MXNet official repository. Launch it with the following values:

mode=distributed
cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/apache/incubator-mxnet.git
cloneFilesFromGit.revision=master
entrypoint.file=image_classification.py
entrypoint.args="--dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync"
entrypoint.workDir=/app/example/gluon/

Check the logs of the worker node:

INFO:root:Starting new image-classification task:, Namespace(batch_norm=False, batch_size=32, builtin_profiler=0, data_dir='', dataset='cifar10', dtype='float32', epochs=1, gpus='', kvstore='dist_sync', log_interval=50, lr=0.1, lr_factor=0.1, lr_steps='30,60,90', mode=None, model='vgg11', momentum=0.9, num_workers=4, prefix='', profile=False, resume='', save_frequency=10, seed=123, start_epoch=0, use_pretrained=False, use_thumbnail=False, wd=0.0001)
INFO:root:downloaded http://data.mxnet.io/mxnet/data/cifar10.zip into data/cifar10.zip successfully
[10:05:40] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: data/cifar/train.rec, use 1 threads for decoding..
[10:05:45] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: data/cifar/test.rec, use 1 threads for decoding..

If you want to increase the verbosity, set the environment variable PS_VERBOSE=1 or PS_VERBOSE=2 using the commonEnvVars value.

mode=distributed
cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/apache/incubator-mxnet.git
cloneFilesFromGit.revision=master
entrypoint.file=image_classification.py
entrypoint.args="--dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync"
entrypoint.workDir=/app/example/gluon/
commonExtraEnvVars[0].name=PS_VERBOSE
commonExtraEnvVars[0].value=1

You will now see log entries in the scheduler and server nodes.

[14:22:44] src/van.cc:290: Bind to role=scheduler, id=1, ip=10.32.0.11, port=9092, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=9 to node role=worker, ip=10.32.0.17, port=55423, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=11 to node role=worker, ip=10.32.0.16, port=60779, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=13 to node role=worker, ip=10.32.0.15, port=39817, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=15 to node role=worker, ip=10.32.0.14, port=48119, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=8 to node role=server, ip=10.32.0.13, port=56713, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=10 to node role=server, ip=10.32.0.12, port=57099, is_recovery=0
[14:22:53] src/van.cc:83: the scheduler is connected to 4 workers and 2 servers
[14:22:53] src/van.cc:183: Barrier count for 7 : 1
[14:22:53] src/van.cc:183: Barrier count for 7 : 2
[14:22:53] src/van.cc:183: Barrier count for 7 : 3
[14:22:53] src/van.cc:183: Barrier count for 7 : 4
...

Sidecars and Init Containers

If you have a need for additional containers to run within the same pod as MXNet (e.g. an additional metrics or logging exporter), you can do so via the sidecars config parameter. Simply define your container according to the Kubernetes container spec.

sidecars:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
   containerPort: 1234

Similarly, you can add extra init containers using the initContainers parameter.

initContainers:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
   containerPort: 1234

Persistence

The Bitnami MXNet image can persist data. If enabled, the persisted path is /bitnami/mxnet by default.

The chart mounts a Persistent Volume at this location. The volume is created using dynamic volume provisioning.

Adjust permissions of persistent volume mountpoint

As the image run as non-root by default, it is necessary to adjust the ownership of the persistent volume so that the container can write data into it.

By default, the chart is configured to use Kubernetes Security Context to automatically change the ownership of the volume. However, this feature does not work in all Kubernetes distributions. As an alternative, this chart supports using an initContainer to change the ownership of the volume before mounting it in the final destination.

You can enable this initContainer by setting volumePermissions.enabled to true.

Name		Name	Last commit message	Last commit date
parent directory ..
ci		ci
templates		templates
.helmignore		.helmignore
Chart.yaml		Chart.yaml
README.md		README.md
values-production.yaml		values-production.yaml
values.yaml		values.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxnet

mxnet

README.md

MXNet

TL;DR

Introduction

Prerequisites

Installing the Chart

Uninstalling the Chart

Parameters

Configuration and installation details

Rolling VS Immutable tags

Production configuration

Loading your files

Distributed training example

Sidecars and Init Containers

Persistence

Adjust permissions of persistent volume mountpoint

Files

mxnet

Directory actions

More options

Directory actions

More options

Latest commit

History

mxnet

Folders and files

parent directory

README.md

MXNet

TL;DR

Introduction

Prerequisites

Installing the Chart

Uninstalling the Chart

Parameters

Configuration and installation details

Rolling VS Immutable tags

Production configuration

Loading your files

Distributed training example

Sidecars and Init Containers

Persistence

Adjust permissions of persistent volume mountpoint