-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with only one table in inputs from sql query while reading from jdbc. #2918
Fix issue with only one table in inputs from sql query while reading from jdbc. #2918
Conversation
54b1ab0
to
0ffd1b2
Compare
@Tag("integration-test") | ||
@Testcontainers | ||
@Slf4j | ||
class InvalidTablesInJdbcInputsTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test really verify invalid tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not good name for test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
runEvents.stream() | ||
.map(OpenLineage.RunEvent::getInputs) | ||
.forEach( | ||
inputs -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we extracting schemas in this case? If not, then it's at least worth writing a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its interesting how we can handle schema when we have more than one table, what we can do is to somehow parse the query and look for column references but I am not sure if its the case here and for simplicity is better to keep the schema to be empty StructType ?
@@ -39,13 39,26 @@ public static <D extends OpenLineage.Dataset> List<D> getDatasets( | |||
String jdbcUrl = relation.jdbcOptions().url(); | |||
Properties jdbcProperties = relation.jdbcOptions().asConnectionProperties(); | |||
|
|||
if (meta.columnLineage().isEmpty() && meta.inTables().size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come empty column level lineage makes here a difference? Could you add some comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ve tried to not make any regression. It might be that the condition is not necessary in this form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ve changed that a little in newest commit, the issue is with how we handle the schema when we have more than one table and we dont have column lineage. In that case we need for each column to set schema to empty StructType or parse plan (but I think thats what column lineage will at the end ll do)
d7b688c
to
dff0c5d
Compare
…from jdbc. Signed-off-by: pawelkocinski <[email protected]>
Signed-off-by: pawelkocinski <[email protected]>
Signed-off-by: pawelkocinski <[email protected]>
dff0c5d
to
e619b82
Compare
return datasetFactory.getDataset( | ||
di.getName(), | ||
di.getNamespace(), | ||
numberOfTables == 1 ? schema : new StructType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this result in schema facet with empty fields? If so, this is something that should be avoided.
If any other OL event emits dataset with schema facet, the schema will be attached to a dataset up until the moment a new schema facet is sent. Sending empty schema facet will clear existing information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That example is only possible to happen when we have no column lineage number of tables in the jdbc query is more than one. Maybe instead this solution we should take one step backward and add column level lineage for jdbc queries with with statement ? I don't see any better solution without stepping back than putting the empty Schema field. Do you have any ideas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding new method io.openlineage.spark.api.DatasetFactory#getDataset(java.lang.String, java.lang.String)
without StructType schema
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: pawelkocinski <[email protected]>
Signed-off-by: pawelkocinski <[email protected]>
Signed-off-by: pawelkocinski <[email protected]>
…from jdbc. (#2918) * Fix issue with only one table in inputs from sql query while reading from jdbc. Signed-off-by: pawelkocinski <[email protected]> --------- Signed-off-by: pawelkocinski <[email protected]>
Problem
Closes: #2808
Solution
Take more elements from array than the first one.
One-line summary:
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project