Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Unions in schemas and validation #1227

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karajan1001
Copy link
Contributor

@karajan1001 karajan1001 commented Jun 20, 2023

fix: #1152
Make Pandera to support Union Type. That is the validation of a Series/Column should allow multiple types.

  1. Add a new PythonUnion type.
  2. Add a new test for the new UnionType.

It matches all of the requires in the original #1152

  1. if the column is str dtype, then pass
  2. if the column is float dtype, then pass
  3. if the column is object data type, check that values are str or float. If so, then pass.
  4. fail if none of the above conditions are met.

And I'm not sure if the tests I added are enough because I didn't know where to add a negative test. Besides this mypy will fail and I don't know how to handle it.

@cosmicBboy
Copy link
Collaborator

hi @karajan1001 thanks for the contribution! Check out the contribution docs to learn how to run all the linters and tests locally to make sure things are working: https://pandera.readthedocs.io/en/stable/CONTRIBUTING.html

@karajan1001 karajan1001 force-pushed the fix1152 branch 3 times, most recently from 4453746 to 091335b Compare June 24, 2023 03:25
@codecov
Copy link

codecov bot commented Jun 24, 2023

Codecov Report

Attention: Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.72%. Comparing base (19cc15d) to head (273c49b).
Report is 221 commits behind head on main.

Files with missing lines Patch % Lines
pandera/engines/pandas_engine.py 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1227       /-   ##
==========================================
- Coverage   97.23%   93.72%   -3.51%     
==========================================
  Files          65       90       25     
  Lines        5066     6714     1648     
==========================================
  Hits         4926     6293     1367     
- Misses        140      421      281     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@karajan1001
Copy link
Contributor Author

**codecov/patch ** — 77.27% of diff hit (target 97.23%)

The coverage failure is what I mentioned in previous that I didn't find a place to add some negative test examples.

@karajan1001
Copy link
Contributor Author

@cosmicBboy

Excuse me, where should I add the negative tests to increase the coverage?

@cosmicBboy
Copy link
Collaborator

hi @karajan1001, you can find the parts of the code changes that were not covered in the tests in the "Files changed" tab, e.g. here: https://github.com/unionai-oss/pandera/pull/1227/files#diff-2691db6e54a8a776ed24713a7fa20bc1b86b6361e8a7bf7323722e301c359a76R1357

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete the Untitled.ipynb notebook?

try:
pandera_dtype = Engine.dtype(pandera_dtype)
except TypeError:
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to create new tests in the test_dtypes.py file to cover the cases that codecov is complaining about.

fix: unionai-oss#1152
I would like pandera to support Union Type. That is the validation of a
Series/Column should allow multiple types.

1. Add a new PythonUnion type.
2. Add a new test to for the new UnionType.

Signed-off-by: karajan1001 <[email protected]>
if pandera_dtype not in map(Engine.dtype, pandas_types):
return False

if data_container is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001 can we add a test case for when data_container is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @cosmicBboy .I'm sorry I do not know how to create a test case in which data_container is None.
I tried
"""
pd.DataDrame{[]}
"""
But it just put an empty pd.Series in.

@jjfantini
Copy link

any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multi type (Unions) in schemas and validation
3 participants