Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3680: [Python] allow to disable name validation #1995

Conversation

jjaakola-aiven
Copy link
Contributor

Contribution Checklist

What is the purpose of the change

https://issues.apache.org/jira/browse/AVRO-3680

For interoperability the Python parsing API allows disabling the name validation. It is possible to create schema with names having e.g. dashes from Java SDK. The Python name validation is stricter and follows the Avro specification.

Verifying this change

This change added tests and can be verified as follows:

  • Moved existing tests for Name and Names to /lang/py/avro/test/test_name.py.
  • Added tests for invalid names when parsing with new validate_names flag set to false.

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented

@github-actions github-actions bot added the Python label Dec 2, 2022
@RyanSkraba RyanSkraba self-requested a review December 2, 2022 17:11
@RyanSkraba
Copy link
Contributor

N.B. One of the build failures is due to https://issues.apache.org/jira/browse/AVRO-3681

For interoperability the Python parsing API allows disabling the
name validation. It is possible to create schema with names having
e.g. dashes from Java SDK. The Python name validation is stricter
and follows the Avro specification.
@jjaakola-aiven jjaakola-aiven force-pushed the AVRO-3680-python-allow-disabling-name-validation branch from 07de050 to f1aa540 Compare December 5, 2022 10:54
@NotoriousPyro
Copy link

If the python validation is stricter, is the bug not with Java SDK? (which should be more strict to prevent creating such names)

It seems the lang/js and a few others also restrict from using hyphens and the like. They too would need updating to allow hyphens, no?

@jjaakola-aiven
Copy link
Contributor Author

@NotoriousPyro I do also feel that Java is too lenient and should follow the spec. The actual issue is that there are existing schemas in the wild with names that are not compliant. So the workaround to disable name validation is needed. For new data I think the validation should be enabled, also on Java SDK.

Copy link
Contributor

@RyanSkraba RyanSkraba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM -- thanks for taking the time to find all these places!

@RyanSkraba RyanSkraba merged commit 012338f into apache:master Dec 8, 2022
@RyanSkraba
Copy link
Contributor

@NotoriousPyro There is a current discussion on the mailing list noted in the JIRA -- the issue is that some systems rely on the "tolerance" that has existed for non-compliant names in the Java SDK.

To guarantee interoperability today, the validate_names option should definitely be set to True (at time of writing).

RyanSkraba pushed a commit that referenced this pull request Dec 8, 2022
For interoperability the Python parsing API allows disabling the
name validation. It is possible to create schema with names having
e.g. dashes from Java SDK. The Python name validation is stricter
and follows the Avro specification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants