This repository contains end-to-end example solution based on the Computer Hardware dataset built from Azure Data Factory, Azure Data Lake Gen 2, and Azure Machine Learning Python SDK to ingest data from multiple data sources, build machine learning models, and serve machine learning models as HTTP endpoints.
- User has data sources from Azure SQL Database and Azure Cosmos Database
- User ingests data from data sources with Azure Data Factory to Azure Data Lake Gen 2
- User performs data preparation using Azure Data Factory Wrangling Data Flow
- User trains machine learning model using Azure Machine Learning Service
- User deploys machine learning model to Azure Container Instance using Azure Machine Learning Python SDK
-
Environment preparation
- Run
AZ_SUBSCRIPTION_ID='{subscription-id}' AZ_BASE_NAME='{unique-base-name}' AZ_REGION='{azure-region}' ./build_environment.sh
to provision the Azure environment - Through Azure Storage Explorer, upload data files from
./data/*
to ADLSG2 "demo-prep" container - Through ADF portal, execute pipeline "PL_E2E_Demo_Prep" (under "Demp-Prep" folder) to hydrate Azure Cosmos DB and Azure SQL Database
- Run
-
Through ADF portal, execute pipeline "PL_E2E_MachineData" to hydrate Azure Data Lake Gen 2 and curate the raw data into curated zone
-
Through Azure Machine Learning studio [preview],
- Upgrade AML workspace to Enterprise edition. This is required for the advanced AutoML features which this solution will use.
- Create notebook VMs (NBVM) with unique VM name and VM size "STANDARD_DS3_V2"
- Note: make sure that AML studio is scoped to the appropriate AML workspace that is created by the build automation
-
Create service principal using the following command and note the output (the output is needed later for AML notebook):
az ad sp create-for-rbac \
-n "{unique-sp-name}" \
--role 'Storage Blob Data Reader' \
--scopes /subscriptions/{subscriptions-id}/resourceGroups/{rg-name}/providers/Microsoft.Storage/storageAccounts/{adlsg2-name}
- Through AML NBVM Jupyter:
- Create a new terminal and clone this repository (Note:
git
is pre-installed on AML NBVM) - Open and walkthrough
azure-e2e-ml/aml/configuration.ipynb
to configure local environment with AML configurations- Note: you have to replace the default values of
SUBSCRIPTION_ID
,RESOURCE_GROUP
,WORKSPACE_NAME
,WORKSPACE_REGION
with appropriate values in this notebook
- Note: you have to replace the default values of
- Open and walkthrough
azure-e2e-ml/aml/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb
to build and deploy model - Note: Currently, mini-widget is not support in JupyterLab. Thus, we are using Jupyter for executing the notebook. There is a GitHub issue opened to track the issue.
- Create a new terminal and clone this repository (Note:
- Architecture diagram
- Automation: build script
- No warranties or guarantees are made or implied.
- All assets here are provided by me "as is". Use at your own risk. Validate before use.
- I am not representing my employer with these assets, and my employer assumes no liability whatsoever, and will not provide support, for any use of these assets.
- Use of the assets in this repo in your Azure environment may or will incur Azure usage and charges. You are completely responsible for monitoring and managing your Azure usage.
Unless otherwise noted, all assets here are authored by me. Feel free to examine, learn from, comment, and re-use (subject to the above) as needed and without intellectual property restrictions.
If anything here helps you, attribution and/or a quick note is much appreciated.