Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark: add GCP run and job facets #2643

Merged
merged 14 commits into from
Jun 18, 2024

Conversation

codelixir
Copy link
Contributor

@codelixir codelixir commented Apr 25, 2024

Problem

Adds GCPRunFacetBuilder and GCPJobFacetBuilder to collect GCP-specific properties at run and job levels respectively.

Closes: #2641

Solution

The SPARK_DIST_CLASSPATH property can be used to determine whether it is a GCP Dataproc environment, depending on which we can decide to add the respective facet builders to the list. Other properties are obtained from Google's compute metadata API.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

One-line summary:

Adds GCPRunFacetBuilder and GCPJobFacetBuilder to report additional facets when running on Google Cloud Platform

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
@codelixir codelixir force-pushed the feature/gcp-facets branch from dde2159 to a98ff94 Compare April 29, 2024 05:31
Signed-off-by: Pahulpreet Singh <[email protected]>
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great piece of code. Thank you for contributing. Please look at my questions and comments below.

Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
@codelixir codelixir changed the title spark: add GCP environment facet spark: add GCP run and job facets Jun 14, 2024
@boring-cyborg boring-cyborg bot added the language:java Uses Java programming language label Jun 14, 2024
@codelixir codelixir force-pushed the feature/gcp-facets branch from c7d93bc to 77d78b1 Compare June 14, 2024 06:33
@codelixir
Copy link
Contributor Author

Hi, I have modified the run facet according to #2771 and also added the job facet proposed in #2740.

@codelixir codelixir force-pushed the feature/gcp-facets branch from 77d78b1 to 20b197a Compare June 14, 2024 06:47
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
@codelixir codelixir requested a review from a team as a code owner June 17, 2024 14:04
@boring-cyborg boring-cyborg bot added the area:tests Testing code label Jun 17, 2024
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
@codelixir
Copy link
Contributor Author

The CI tests which are failing were taken care of in #2655

Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great piece of work. Thank you for contributing.

@pawel-big-lebowski
Copy link
Collaborator

The CI tests which are failing were taken care of in #2655

could you then rebase your changes?

@pawel-big-lebowski pawel-big-lebowski merged commit 55bf013 into OpenLineage:main Jun 18, 2024
34 checks passed
Copy link

boring-cyborg bot commented Jun 18, 2024

Great job! Congrats on your first merged pull request in OpenLineage!

codelixir added a commit to codelixir/OpenLineage that referenced this pull request Jun 20, 2024
Signed-off-by: Pahulpreet Singh <[email protected]>
mobuchowski pushed a commit that referenced this pull request Jun 20, 2024
* Limit logging from MockServer interactions in GCPUtilsTest

Signed-off-by: Pahulpreet Singh <[email protected]>

* update changelog for #2643

Signed-off-by: Pahulpreet Singh <[email protected]>

---------

Signed-off-by: Pahulpreet Singh <[email protected]>
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
* spark: add GCP environment facet

Signed-off-by: Pahulpreet Singh <[email protected]>

* apply spotless check

Signed-off-by: Pahulpreet Singh <[email protected]>

* spotless check

Signed-off-by: Pahulpreet Singh <[email protected]>

* address comments

Signed-off-by: Pahulpreet Singh <[email protected]>

* Revise GCP Dataproc Run Facet

Signed-off-by: Pahulpreet Singh <[email protected]>

* add GCP common Job Facet

Signed-off-by: Pahulpreet Singh <[email protected]>

* update GCP Dataproc facet as per OpenLineage#2771

Signed-off-by: Pahulpreet Singh <[email protected]>

* Update schema URL for GCP facets as per the new registry

Signed-off-by: Pahulpreet Singh <[email protected]>

* fix PMD violations

Signed-off-by: Pahulpreet Singh <[email protected]>

* Writing tests for GCPUtils

Signed-off-by: Pahulpreet Singh <[email protected]>

* add license header

Signed-off-by: Pahulpreet Singh <[email protected]>

* fix pmd violations

Signed-off-by: Pahulpreet Singh <[email protected]>

* Resolve merge conflicts and bugs

Signed-off-by: Pahulpreet Singh <[email protected]>

---------

Signed-off-by: Pahulpreet Singh <[email protected]>
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
* Limit logging from MockServer interactions in GCPUtilsTest

Signed-off-by: Pahulpreet Singh <[email protected]>

* update changelog for OpenLineage#2643

Signed-off-by: Pahulpreet Singh <[email protected]>

---------

Signed-off-by: Pahulpreet Singh <[email protected]>
@codelixir codelixir deleted the feature/gcp-facets branch November 14, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:integration/spark area:tests Testing code language:java Uses Java programming language
Projects
None yet
2 participants