Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK] fix drop table for Spark 3.4 and higher #2745

Merged
merged 1 commit into from
May 31, 2024

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented May 31, 2024

Problem

Drop table does not emit OL events properly for Spark >= 3.4.

Closes: #2716

Solution

Modify the visitor/ dataset builder and turn on the tests which verify output dataset

Note: All schema changes require discussion. Please link the issue for context.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

One-line summary:

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (not required for changes to tests, docs, or CI config)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project

@boring-cyborg boring-cyborg bot added area:integration/spark area:tests Testing code language:java Uses Java programming language labels May 31, 2024
@pawel-big-lebowski pawel-big-lebowski force-pushed the spark/drop-table-visitor branch 4 times, most recently from 48713e9 to caa5841 Compare May 31, 2024 07:15
@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label May 31, 2024
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review May 31, 2024 07:20
CHANGELOG.md Outdated Show resolved Hide resolved
@pawel-big-lebowski pawel-big-lebowski merged commit dc880df into main May 31, 2024
33 checks passed
@pawel-big-lebowski pawel-big-lebowski deleted the spark/drop-table-visitor branch May 31, 2024 11:03
ngorchakova pushed a commit to ngorchakova/OpenLineage that referenced this pull request Jun 11, 2024
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>
harels pushed a commit that referenced this pull request Jun 11, 2024
* Register GCP common job facet

Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] verify jar content after build (#2698)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Apply prettier for json

Signed-off-by: Natalia Gorchakova <[email protected]>

* Ignore registry.json files by generator

Signed-off-by: Natalia Gorchakova <[email protected]>

* Spark: Fix historyUrl format (#2741)

Signed-off-by: Martynov Maxim <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Add Atlan as OpenLineage contributor (#2742)

Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] fix drop table for Spark 3.4 and higher (#2745)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* spark: make sure debug logging is guarded when it can cause function call (#2744)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Bump org.assertj:assertj-core from 3.25.3 to 3.26.0 in /client/java (#2747)

Bumps [org.assertj:assertj-core](https://github.com/assertj/assertj) from 3.25.3 to 3.26.0.
- [Release notes](https://github.com/assertj/assertj/releases)
- [Commits](assertj/assertj@assertj-build-3.25.3...assertj-build-3.26.0)

---
updated-dependencies:
- dependency-name: org.assertj:assertj-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333 dependabot[bot]@users.noreply.github.com>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] fix NPE in column level lineage (#2749)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Update the name for facet;
Update procedure to publish facets to documentation: ignore registry.json fileds

Signed-off-by: Natalia Gorchakova <[email protected]>

* alias: allow self-recursive aliases (#2753)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* fix changelog (#2759)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] refactor OpenLineageRunEventBuilder (#2754)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Bump mypy to 1.10. (#2760)

Use attr scope so that attributes named `field` do not break static checks.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Dataset host resolver feature (#2720)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* remodeled transformation type (#2756)

* generated python client, backward compatibility fixed

Signed-off-by: tnazarew <[email protected]>

Update class after #2760 fix.

Signed-off-by: Jakub Dardzinski <[email protected]>

* move required fields into transformation object

Signed-off-by: tnazarew <[email protected]>

Co-authored-by: Jakub Dardzinski <[email protected]>

* update python classes

Signed-off-by: tnazarew <[email protected]>

* type changed to DIRECT|INDIRECT

Signed-off-by: tnazarew <[email protected]>

* add changelog

Signed-off-by: tnazarew <[email protected]>

* change deprecation and fix changelog

Signed-off-by: tnazarew <[email protected]>

* add deprecated field info to changelog

Signed-off-by: tnazarew <[email protected]>

* fix redact_fields for Transformation

Signed-off-by: tnazarew <[email protected]>

* updated generated python class

Signed-off-by: tnazarew <[email protected]>

---------

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: tnazarew <[email protected]>
Co-authored-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

---------

Signed-off-by: Natalia Gorchakova <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Martynov Maxim <[email protected]>
Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: tnazarew <[email protected]>
Co-authored-by: pawel.leszczynski <[email protected]>
Co-authored-by: Maxim Martynov <[email protected]>
Co-authored-by: Kacper Muda <[email protected]>
Co-authored-by: Maciej Obuchowski <[email protected]>
Co-authored-by: dependabot[bot] <49699333 dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jakub Dardzinski <[email protected]>
Co-authored-by: tnazarew <[email protected]>
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
* Register GCP common job facet

Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] verify jar content after build (OpenLineage#2698)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Apply prettier for json

Signed-off-by: Natalia Gorchakova <[email protected]>

* Ignore registry.json files by generator

Signed-off-by: Natalia Gorchakova <[email protected]>

* Spark: Fix historyUrl format (OpenLineage#2741)

Signed-off-by: Martynov Maxim <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Add Atlan as OpenLineage contributor (OpenLineage#2742)

Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] fix drop table for Spark 3.4 and higher (OpenLineage#2745)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* spark: make sure debug logging is guarded when it can cause function call (OpenLineage#2744)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Bump org.assertj:assertj-core from 3.25.3 to 3.26.0 in /client/java (OpenLineage#2747)

Bumps [org.assertj:assertj-core](https://github.com/assertj/assertj) from 3.25.3 to 3.26.0.
- [Release notes](https://github.com/assertj/assertj/releases)
- [Commits](assertj/assertj@assertj-build-3.25.3...assertj-build-3.26.0)

---
updated-dependencies:
- dependency-name: org.assertj:assertj-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333 dependabot[bot]@users.noreply.github.com>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] fix NPE in column level lineage (OpenLineage#2749)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Update the name for facet;
Update procedure to publish facets to documentation: ignore registry.json fileds

Signed-off-by: Natalia Gorchakova <[email protected]>

* alias: allow self-recursive aliases (OpenLineage#2753)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* fix changelog (OpenLineage#2759)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* [SPARK] refactor OpenLineageRunEventBuilder (OpenLineage#2754)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Bump mypy to 1.10. (OpenLineage#2760)

Use attr scope so that attributes named `field` do not break static checks.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* Dataset host resolver feature (OpenLineage#2720)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

* remodeled transformation type (OpenLineage#2756)

* generated python client, backward compatibility fixed

Signed-off-by: tnazarew <[email protected]>

Update class after OpenLineage#2760 fix.

Signed-off-by: Jakub Dardzinski <[email protected]>

* move required fields into transformation object

Signed-off-by: tnazarew <[email protected]>

Co-authored-by: Jakub Dardzinski <[email protected]>

* update python classes

Signed-off-by: tnazarew <[email protected]>

* type changed to DIRECT|INDIRECT

Signed-off-by: tnazarew <[email protected]>

* add changelog

Signed-off-by: tnazarew <[email protected]>

* change deprecation and fix changelog

Signed-off-by: tnazarew <[email protected]>

* add deprecated field info to changelog

Signed-off-by: tnazarew <[email protected]>

* fix redact_fields for Transformation

Signed-off-by: tnazarew <[email protected]>

* updated generated python class

Signed-off-by: tnazarew <[email protected]>

---------

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: tnazarew <[email protected]>
Co-authored-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Natalia Gorchakova <[email protected]>

---------

Signed-off-by: Natalia Gorchakova <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Martynov Maxim <[email protected]>
Signed-off-by: Kacper Muda <[email protected]>
Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: tnazarew <[email protected]>
Co-authored-by: pawel.leszczynski <[email protected]>
Co-authored-by: Maxim Martynov <[email protected]>
Co-authored-by: Kacper Muda <[email protected]>
Co-authored-by: Maciej Obuchowski <[email protected]>
Co-authored-by: dependabot[bot] <49699333 dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jakub Dardzinski <[email protected]>
Co-authored-by: tnazarew <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation Improvements or additions to documentation area:integration/spark area:tests Testing code language:java Uses Java programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]- drop event is not received from spark 3.4.2 and 3.5.0
3 participants