-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GCP Dataplex transport as separate module #3043
Add GCP Dataplex transport as separate module #3043
Conversation
d3a80e6
to
d436ddb
Compare
@ddebowczyk92 |
d436ddb
to
efe19ae
Compare
3942293
to
1c4d9e8
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #3043 /- ##
=======================================
Coverage 85.70% 85.70%
=======================================
Files 54 54
Lines 3112 3112
=======================================
Hits 2667 2667
Misses 445 445 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is: is this code specific to OpenLineage or certain lineage backend? In case of the latter: are we happy to build and release transports for all the other OL backends in case someone wants it? What are the advantages of openlineage community taking care of this?
The disadvantage of this approach is that: we take responsibility for the code without being able to verify its correctness. One can easily make a change within DataplexTransport
which breaks the integration while passing all the tests. It's difficult to maintain such package, as even upgrading dependencies like `com.google.cloud:google-cloud-datalineage' is risky.
If we want to maintain such code, I think we need an integration test with real DataPlex credentials, same as we have for BigQuery.
1c4d9e8
to
0810f59
Compare
I believe there is no problem with us maintaining wrappers (transports are wrappers for specific libraries) for sending events to significant OpenLineage backends. I don't see a difference between DataplexTransport and KafkaTransport here for example, given that they both produce consumable JAR that the transport uses. Adding this as a separate releasable Maven package is actually a step forward for modularity: even if something breaks inside, it affects only users that use it - it's not the case with
I don't agree with this, the analogy is comparable to KafkaTransport. We don't test if
@tnazarew is writing integration consumer tests that would be able to cover whether particular transport produces information that's relevant to the particular consumer. |
0810f59
to
40ad6a7
Compare
### Configuration | ||
|
||
- `type` - string, must be `"dataplex"`. Required. | ||
- `endpoint` - string, specifies the endpoint to which events are sent. Optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide the format for the endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@midamkina Is the endpoint config redundant with the ProjectId Location combination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there can be a different environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@midamkina Is the endpoint of the form projects/{project}/locations/{location}/
? If so, is the distinction here only about the service's project vs the billing project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that projects/{project}/locations/{location}/ is the parent. Endpoint has an address that can point to a different environment.
...orts-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexTransport.java
Outdated
Show resolved
Hide resolved
...orts-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexTransport.java
Outdated
Show resolved
Hide resolved
...orts-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexTransport.java
Outdated
Show resolved
Hide resolved
...nsports-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexConfig.java
Outdated
Show resolved
Hide resolved
...nsports-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexConfig.java
Show resolved
Hide resolved
aacf882
to
318c25e
Compare
...orts-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexTransport.java
Outdated
Show resolved
Hide resolved
...nsports-dataplex/src/main/java/io/openlineage/client/transports/dataplex/DataplexConfig.java
Show resolved
Hide resolved
318c25e
to
9be54af
Compare
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
9be54af
to
724b320
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds
GCP
transport as separate module.If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports
S3
andGCS
filesystem operations, tested with AWS EMR).Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project