-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spark: add GCP run and job facets #2643
spark: add GCP run and job facets #2643
Conversation
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
dde2159
to
a98ff94
Compare
Signed-off-by: Pahulpreet Singh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great piece of code. Thank you for contributing. Please look at my questions and comments below.
integration/spark/shared/src/main/java/io/openlineage/spark/agent/facets/EnvironmentFacet.java
Outdated
Show resolved
Hide resolved
integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/GCPUtils.java
Outdated
Show resolved
Hide resolved
integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/GCPUtils.java
Outdated
Show resolved
Hide resolved
integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/GCPUtils.java
Outdated
Show resolved
Hide resolved
integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/GCPUtils.java
Show resolved
Hide resolved
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
c7d93bc
to
77d78b1
Compare
Signed-off-by: Pahulpreet Singh <[email protected]>
77d78b1
to
20b197a
Compare
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
Signed-off-by: Pahulpreet Singh <[email protected]>
The CI tests which are failing were taken care of in #2655 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great piece of work. Thank you for contributing.
could you then rebase your changes? |
Signed-off-by: Pahulpreet Singh <[email protected]>
Great job! Congrats on your first merged pull request in OpenLineage! |
Signed-off-by: Pahulpreet Singh <[email protected]>
* Limit logging from MockServer interactions in GCPUtilsTest Signed-off-by: Pahulpreet Singh <[email protected]> * update changelog for #2643 Signed-off-by: Pahulpreet Singh <[email protected]> --------- Signed-off-by: Pahulpreet Singh <[email protected]>
* spark: add GCP environment facet Signed-off-by: Pahulpreet Singh <[email protected]> * apply spotless check Signed-off-by: Pahulpreet Singh <[email protected]> * spotless check Signed-off-by: Pahulpreet Singh <[email protected]> * address comments Signed-off-by: Pahulpreet Singh <[email protected]> * Revise GCP Dataproc Run Facet Signed-off-by: Pahulpreet Singh <[email protected]> * add GCP common Job Facet Signed-off-by: Pahulpreet Singh <[email protected]> * update GCP Dataproc facet as per OpenLineage#2771 Signed-off-by: Pahulpreet Singh <[email protected]> * Update schema URL for GCP facets as per the new registry Signed-off-by: Pahulpreet Singh <[email protected]> * fix PMD violations Signed-off-by: Pahulpreet Singh <[email protected]> * Writing tests for GCPUtils Signed-off-by: Pahulpreet Singh <[email protected]> * add license header Signed-off-by: Pahulpreet Singh <[email protected]> * fix pmd violations Signed-off-by: Pahulpreet Singh <[email protected]> * Resolve merge conflicts and bugs Signed-off-by: Pahulpreet Singh <[email protected]> --------- Signed-off-by: Pahulpreet Singh <[email protected]>
* Limit logging from MockServer interactions in GCPUtilsTest Signed-off-by: Pahulpreet Singh <[email protected]> * update changelog for OpenLineage#2643 Signed-off-by: Pahulpreet Singh <[email protected]> --------- Signed-off-by: Pahulpreet Singh <[email protected]>
Problem
Adds GCPRunFacetBuilder and GCPJobFacetBuilder to collect GCP-specific properties at run and job levels respectively.
Closes: #2641
Solution
The
SPARK_DIST_CLASSPATH
property can be used to determine whether it is a GCP Dataproc environment, depending on which we can decide to add the respective facet builders to the list. Other properties are obtained from Google's compute metadata API.One-line summary:
Adds GCPRunFacetBuilder and GCPJobFacetBuilder to report additional facets when running on Google Cloud Platform
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project