Interactive example with skein
Here is an interactive example Skein and HDFS storage with a virtual environment. You can also execute it directly in a Jupyter notebook.
- Prepare a virtual environment with skein & numpy
$ cd examples/interactive-mode
$ python3 -m venv venv
$ . venv/bin/activate
$ pip install cluster-pack numpy skein
python
- Define the workload to execute remotely
def compute_intersection():
a = np.random.random_integers(0, 100, 100)
b = np.random.random_integers(0, 100, 100)
print("Computed intersection of two arrays:")
print(np.intersect1d(a, b))
- Upload current virtual environment to the distributed storage (HDFS in this case)
import cluster_pack
package_path, _ = cluster_pack.upload_env()
- Call skein config helper to get the config that easily executes this function on the cluster
from cluster_pack.skein import skein_config_builder
skein_config = skein_config_builder.build_with_func(
func=compute_intersection,
package_path=package_path
)
- Submit a simple skein application
import skein
with skein.Client() as client:
service = skein.Service(
resources=skein.model.Resources("1 GiB", 1),
files=skein_config.files,
script=skein_config.script
)
spec = skein.ApplicationSpec(services={"service": service})
app_id = client.submit(spec)