Create guide for Machine Learning Engine operators #8207 #8968

U-Ozdemir · 2020-05-22T12:14:07Z

This my first time working with an open source project and I posting here the first attempt for an ML operator guide. It is still in progress now, but some feedback is always welcome.

Guide for ML operator (Work in progress)

AI Platform:

The AI Platform is used to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data.
Machine learning (ML) is a subfield of artificial intelligence (AI). The goal of ML is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model.

Prerequisite Tasks

To use these operators, you must do a few things:

Select or create a Cloud Platform project using Cloud Console.
Enable billing for your project, as described in Google Cloud documentation.
Enable API, as described in Cloud Console documentation.
Install API libraries via pip.

pip install 'apache-airflow[gcp]'

Detailed information is available Installation

Setup connection.

Service description:

AI Platform:

The AI Platform is used to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data.
Machine learning (ML) is a subfield of artificial intelligence (AI). The goal of ML is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model.

Operators

You will need to set up a Python dictionary containing all the arguments applied to all the tasks in your workflow by using default_args.
start_date = determines the execution day of the first DAG task instant
params = a dictionary of DAG level parameters that are made accessible in templates, namespaced under params. These params can be overridden at the task level.

default_args = {
    "start_date": days_ago(1),
    "params": {
        "model_name": MODEL_NAME
    }
}

MLEngineManageModelOperator

Use the MLEngineManageModelOperator to create a ML model. The task_id refers to name of task (creating a model in this case). It's basically a description of what your task does. Project_id refers to the name you have given your project, model_name refers to the name you have given your model.

 create_model = MLEngineManageModelOperator(
        task_id="create-model",
        project_id=PROJECT_ID,
        operation='create',
        model={
            "name": MODEL_NAME,
        },
    )

MLEngineCreateVersionOperator

With the MLEngineCreateVersionOperator a version can be created of a operator. The task_id is a unique, meaningful id for the task, project_id is a the name of the project it refers too, model_name refers to the name of your model, version contains information you have giving to this version of operator. (Do I need to explain the arguments in version? I don't see this in other examples.)

create_version = MLEngineCreateVersionOperator(
        task_id="create-version",
        project_id=PROJECT_ID,
        model_name=MODEL_NAME,
        version={
            "name": "v1",
            "description": "First-version",
            "deployment_uri": '{}/keras_export/'.format(JOB_DIR),
            "runtime_version": "1.14",
            "machineType": "mls1-c1-m2",
            "framework": "TENSORFLOW",
            "pythonVersion": "3.5"
        }
    )

MLEngineDeleteVersionOperator

Use the MLEngineDeleteVersionOperator to delete a version of your ML model. The task_id refers to what your task does (it's just name), project_id refers to the name you have given your project, model_name refers to the name you have given your model, version_name refers to which version of a model you want to delete.

delete_version = MLEngineDeleteVersionOperator(
        task_id="delete-version",
        project_id=PROJECT_ID,
        model_name=MODEL_NAME,
        version_name="v1"
    )

MLEngineDeleteModelOperator

The MLEngineDeleteModelOperator deletes your whole ML model. The task_id is the a descriptive name given to a task, project_id refers to the name you have given your project, model_name refers to the name you have given your model, delete_contents when set on True is will delete eveything within your ML model (model_name).

delete_model = MLEngineDeleteModelOperator(
        task_id="delete-model",
        project_id=PROJECT_ID,
        model_name=MODEL_NAME,
        delete_contents=True
    )

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Unit tests coverage for changes (not needed for documentation changes)
Target Github ISSUE in description if exists
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

…8625) PostgresHook's parent class, DbApiHook, implements upsert in its insert_rows() method with the replace=True flag. However, the underlying generated SQL is specific to MySQL's "REPLACE INTO" syntax and is not applicable to PostgreSQL. This pulls out the sql generation code for insert/upsert out in to a method that is then overridden in the PostgreSQL subclass to generate the "INSERT ... ON CONFLICT DO UPDATE" syntax ("new" since Postgres 9.5)

Right now requirements will be only checked during the CI build if the setup.py has changed and if yes, clear instructions will be given. The diff will still be printed otherwise but it will not cause the job to fail

Their response format is like {"example_dag_id": [{"state": "success", "dag_id": "example_dag_id"}, ...], ...} The dag_id is already used as the "key", but still repeatedly appear in each element, which makes the response payload size unnecessarily bigger

Allow EmrCreateJobFlowOperator and EmrAddStepsOperator to receive their 'job_flow_overrides', and 'steps' arguments respectively as Jinja template filenames. This is similar to BashOperator's capability of receiving a filename as its 'bash_command' argument.

Changes deprecated config check rules. Now uses regex to look for an old pattern in the val. Updates 'hostname_callable'. This lets us pull the change back in to 1.10.x, so that by the time 2.0 is around people will have had time and notice to update, without reading (the now quite long) UPDATING.md. Depends on #8463

…8663) Otherwise the behaviour in UI is incorrect Addressing issue #8662

…#8667)

When using KubernetesExecutor without any centralized PV for log storage, one has to wait until the logs get uploaded to cloud storage before viewing them on UI. With this change, the webserver will try to fetch logs from running worker pods and display them.

Fix #8530

connection add/edit UI pages were not working correctly for Spark connections. The root-cause is that "spark" is not listed in models.Connection._types. So when www/forms.py tries to produce the UI, "spark" is not available and it always tried to "fall back" to the option list whose first entry is "Docker" In addition, we should hide irrelevant entries for spark connections ("schema", "login", and "password")

Currently the connection type list in the UI is sorted in the original order of `Connection._types`, which may be a bit inconvenient for users. It would be better if it can be sorted alphabetically.

* Remove config side effects * Fix LatestOnlyOperator return type to be json serializable * Fix tests/test_configuration.py * Fix tests/executors/test_dask_executor.py * Fix tests/jobs/test_scheduler_job.py * Fix tests/models/test_cleartasks.py * Fix tests/models/test_taskinstance.py * Fix tests/models/test_xcom.py * Fix tests/security/test_kerberos.py * Fix tests/test_configuration.py * Fix tests/test_logging_config.py * Fix tests/utils/test_dag_processing.py * Apply isort * Fix tests/utils/test_email.py * Fix tests/utils/test_task_handler_with_custom_formatter.py * Fix tests/www/api/experimental/test_kerberos_endpoints.py * Fix tests/www/test_views.py * Code refactor * Fix tests/www/api/experimental/test_kerberos_endpoints.py * Fix requirements * fixup! Fix tests/www/test_views.py

Co-authored-by: Ace Haidrey <[email protected]>

Co-authored-by: James Timmins <[email protected]>

Co-authored-by: michalslowikowski00 <[email protected]>

…8910) Currently there is no way to determine the state of a TaskInstance in the graph view or tree view for people with colour blindness Approximately 4.5% of people experience some form of colour vision deficiency

The singularity operator tests _have always_ used mocking, so we were adding 700MB to our docker image for nothing. Fixes #8774

CSRF_ENABLED does nothing. Thankfully, due to sensible defaults in flask-wtf, CSRF is on by default, but we should set this correctly. Fixes #8915

All PRs will used cached "latest good" version of the python base images from our GitHub registry. The python versions in the Github Registry will only get updated after a master build (which pulls latest Python image from DockerHub) builds and passes test correctly. This is to avoid problems that we had recently with Python patchlevel releases breaking our Docker builds.

Slight "improvement" on #8949

…c dag (#8952) The scheduler_dag_execution_timing script wants to run _n_ dag runs to completion. However since the start date of those dags is Dynamic (`now - delta`) we can't pre-compute the execution_dates like we were before. (This is because the execution_date of the very first dag run would be `now()` of the parser process, but if we try to pre-compute that in the benchmark process it would see a different value of now().) This PR changes it to instead watch for the first _n_ dag runs to be completed. This should make it work with more dags with less changes to them.

* Push CI images to Docker packcage cache for v1-10 branches This is done as a commit to master so that we can keep the two branches in sync Co-Authored-By: Ash Berlin-Taylor <[email protected]> * Run Github Actions against v1-10-stable too Co-authored-by: Ash Berlin-Taylor <[email protected]>

Old Repo: https://github.com/Azure/azure-cosmos-python New Repo: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/cosmos/azure-cosmos azure-cosmos==4.0.0 was released on 20 May 2020 that breaks Airflow

`field_path` was renamed to `tag_template_field_path` in >=0.8 and there might be other unknown errors

* [AIRFLOW-5262] Update timeout exception to include dag * PR comment: extract dag id in log to variable

boring-cyborg · 2020-05-22T12:14:11Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: [email protected]
Slack: https://apache-airflow-slack.herokuapp.com/

Co-authored-by: Ace Haidrey <[email protected]>

…8846)

Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior to calling `os._exit` within the forked task runner.

ad-m · 2020-05-23T10:31:38Z

Could you fix source & target branch?

kaxil · 2020-05-25T20:55:00Z

@U-Ozdemir Can you create a new PR against Master please

oxymor0n and others added 30 commits April 30, 2020 10:16

Fix the process of requirements generations (#8648)

4a1d71d

Right now requirements will be only checked during the CI build if the setup.py has changed and if yes, clear instructions will be given. The diff will still be printed otherwise but it will not cause the job to fail

Enhanced documentation around Cluster Policy (#8661)

6560f29

[AIRFLOW-4363] Fix JSON encoding error (#8287)

511d98e

Fix displaying Executor Class Name in "Base Job" table (#8679)

0a7b500

Persist start/end date and duration for DummyOperator Task Instance (#…

d92e848

…8663) Otherwise the behaviour in UI is incorrect Addressing issue #8662

Ensure "started"/"ended" in tooltips are not shown if job not started (…

0954140

…#8667)

Remove _get_pretty_exception_message in PrestoHook

1100cea

Fix #8530

Improve tutorial - Include all imports statements (#8670)

62796b9

Group Google services in one section (#8623)

dd6a7bc

Refactor test_variable_command.py (#8535)

ac59735

Add system test and docs for Facebook Ads operators (#8503)

bc45fa6

Sort connection type list in add/edit page alphabetically (#8692)

ffbbbfc

Currently the connection type list in the UI is sorted in the original order of `Connection._types`, which may be a bit inconvenient for users. It would be better if it can be sorted alphabetically.

Support k8s auth method in Vault Secrets provider (#8640)

d8cb0b5

Add system test for gcs_to_bigquery (#8556)

67caae0

[AIRFLOW-7008] Add perf kit with common used decorators/contexts (#7650)

aec768b

Invalid output in test_variable assertion (#8698)

c3a46b9

Change provider:GCP to provider:Google for Labeler Bot (#8697)

5ddc458

Check consistency between the reference list and howto directory (#8690)

923f423

Prevent clickable sorting on non sortable columns in TI view (#8681)

b31ad51

Co-authored-by: Ace Haidrey <[email protected]>

Import Connection directly from multiprocessing.connection. (#8711)

6600e47

Co-authored-by: James Timmins <[email protected]>

Fix typo in Google Display & Video 360 guide

2c92a29

Co-authored-by: michalslowikowski00 <[email protected]>

Carefully parse warning messages when building documentation (#8693)

41b4c27

Support num_retries field in env var for GCP connection (#8700)

8d6f1aa

kaxil and others added 14 commits May 21, 2020 10:51

Fix DagRun Prefix for Performance script (#8934)

f17b4bb

Remove side-effect of session in FAB (#8940)

a9dfd7d

Fix docstring in DagFileProcessor._schedule_task_instances (#8948)

8d3acd7

Remove singularity from CI images (#8945)

47413d9

The singularity operator tests _have always_ used mocking, so we were adding 700MB to our docker image for nothing. Fixes #8774

Update example webserver_config.py to show correct CSRF config (#8944)

16206cd

CSRF_ENABLED does nothing. Thankfully, due to sensible defaults in flask-wtf, CSRF is on by default, but we should set this correctly. Fixes #8915

Add note in Updating.md about the removel of DagRun.ID_PREFIX (#8949)

97b6cc7

Don't hard-code constants in scheduler_dag_execution_timing (#8950)

b26b3ca

Slight "improvement" on #8949

Pin Version of Azure Cosmos to <4 (#8956)

dd72040

Old Repo: https://github.com/Azure/azure-cosmos-python New Repo: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/cosmos/azure-cosmos azure-cosmos==4.0.0 was released on 20 May 2020 that breaks Airflow

Pin google-cloud-datacatalog to <0.8 (#8957)

94a7673

`field_path` was renamed to `tag_template_field_path` in >=0.8 and there might be other unknown errors

[AIRFLOW-5262] Update timeout exception to include dag (#8466)

9a4a2d1

* [AIRFLOW-5262] Update timeout exception to include dag * PR comment: extract dag id in log to variable

boring-cyborg bot added area:CLI area:dev-tools area:Scheduler including HA (high availability) scheduler labels May 22, 2020

Acehaidrey and others added 8 commits May 22, 2020 07:02

Add context to execution_date_fn in ExternalTaskSensor (#8702)

b055151

Co-authored-by: Ace Haidrey <[email protected]>

Add support for spark python and submit tasks in Databricks operator(#…

f107338

…8846)

Fix typo in test_project_structure (#8978)

e742ef7

Remove duplicate line from CONTRIBUTING.rst (#8981)

4d67704

Flush pending Sentry exceptions before exiting (#7232)

db70da2

Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior to calling `os._exit` within the forked task runner.

Support YAML input for CloudBuildCreateOperator (#8808)

cf5cf45

Add secrets to test_deprecated_packages (#8979)

bdb8369

Fix formatting code block in TESTING.rst (#8985)

f3456b1

ad-m mentioned this pull request May 23, 2020

Protecting from too big pull request (invalid source branch, master meged fork) kaxil/boring-cyborg#21

Open

dependabot bot closed this May 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create guide for Machine Learning Engine operators #8207 #8968

Create guide for Machine Learning Engine operators #8207 #8968

U-Ozdemir commented May 22, 2020

boring-cyborg bot commented May 22, 2020

ad-m commented May 23, 2020

kaxil commented May 25, 2020

Create guide for Machine Learning Engine operators #8207 #8968

Create guide for Machine Learning Engine operators #8207 #8968

Conversation

U-Ozdemir commented May 22, 2020

Guide for ML operator (Work in progress)

AI Platform:

Prerequisite Tasks

Service description:

AI Platform:

Operators

MLEngineManageModelOperator

MLEngineCreateVersionOperator

MLEngineDeleteVersionOperator

MLEngineDeleteModelOperator

boring-cyborg bot commented May 22, 2020

ad-m commented May 23, 2020

kaxil commented May 25, 2020