Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandera's check doesn't work after bundling #3035

Open
zbouslikhin opened this issue Jul 30, 2024 · 4 comments
Open

Pandera's check doesn't work after bundling #3035

zbouslikhin opened this issue Jul 30, 2024 · 4 comments
Assignees
Labels
bug help wanted Please help with this, we think you can
Milestone

Comments

@zbouslikhin
Copy link

  • Nuitka version, full Python version, flavor, OS, etc. as output by this exact command.
2.4.2
Commercial: None
Python: 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)]
Flavor: Unknown
Executable: C:\Users\mgme2\AppData\Local\pypoetry\Cache\virtualenvs\vizprocessing-m5tfYRZH-py3.11\Scripts\python.exe
OS: Windows
Arch: x86_64
WindowsRelease: 10
Version C compiler: C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\Hostx64\x64\cl.exe (cl 14.3).
  • How did you install Nuitka and Python
    Python installed through pyenv-win
    Nuitka installed via poetry add nuitka

  • The specific PyPI names and versions

    poetry show

nuitka                    2.4.2          Python compiler with full language support and CPython c...
numpy                     1.26.4         Fundamental package for array computing in Python
pandas                    1.5.3          Powerful data structures for data analysis, time series,...
pandera                   0.18.3         A light-weight and flexible data validation and testing ...
pydantic                  2.8.2          Data validation using Python type hints
pydantic-core             2.20.1         Core functionality for Pydantic validation and serializa...
scipy                     1.14.0         Fundamental algorithms for scientific computing in Python
websocket-client          1.8.0          WebSocket client for Python with low level API options
websockets                10.4           An implementation of the WebSocket Protocol (RFC 6455 & ...
import pandera as pa
import logging
import pandas as pd
from os.path import isdir

logging.basicConfig(level=logging.DEBUG)

num_records = 10

def is_dir(paths: str | list):
    if isinstance(paths, str):
        if not isdir(paths):
            return False
    elif isinstance(paths, list):
        for path in paths:
            if not isdir(path):
                return False
    return True

DF_CHECK_SCHEMA = pa.DataFrameSchema(
    {
        "save_path": pa.Column(
            pd.StringDtype,
            checks=[
                pa.Check(is_dir, element_wise=True),
            ],
            nullable=False,
            required=True,
        ),
    },
    strict=True,
    coerce=False,
)

DF_NO_CHECK_SCHEMA = pa.DataFrameSchema(
    {
        "save_path": pa.Column(
            pd.StringDtype,
            nullable=False,
            required=True,
        ),
    },
    strict=True,
    coerce=False,
)

data = {
    "save_path": ["C:\\"] * num_records,
}

faked_df = pd.DataFrame({
    "save_path": pd.Series(data["save_path"], dtype="string"),
})

def run_checks():
    try:
        DF_NO_CHECK_SCHEMA.validate(faked_df)   # <-- this works after bundling
        logging.debug("Validated DF WITHOUT any check")
        DF_CHECK_SCHEMA.validate(faked_df) # <-- this doesn't work after bundling
        logging.debug("Validated DF WITH any check")
    except pa.errors.SchemaError as e:
        logging.error(f"Schema validation error: {e}")
        raise

def main():
    run_checks()

if __name__ == '__main__':
    main()

Result:

DEBUG:root:Validated DF WITHOUT any check
ERROR:root:Schema validation error: Error while executing check function: DispatchError("No matching functions found")
Traceback (most recent call last):
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\components.py", line 217, in run_checks
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\base.py", line 102, in run_check
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\checks.py", line 227, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\checks.py", line 296, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\multimethod\__init__.py", line 432, in __call__
multimethod.DispatchError: No matching functions found

Troubleshooting:
After bundling and when using's Pandera's check field an exception is raised. Without a check, all works fine.

Nuitka options used:

python -m nuitka --deployment --standalone --onefile --follow-imports --mingw64 --assume-yes-for-downloads --include-module=pydantic.deprecated.decorator test.py

@kayhayen
Copy link
Member

Bug reports with --deployment should not be made, I need to make that clear in the issue template. People are disabling all automatic bug finding if they do that, and that's not good.

@zbouslikhin
Copy link
Author

zbouslikhin commented Jul 30, 2024

Bug reports with --deployment should not be made, I need to make that clear in the issue template. People are disabling all automatic bug finding if they do that, and that's not good.

Makes sense. I ran again the same, but without the --standalone argument (python -m nuitka --standalone --onefile --follow-imports --mingw64 --assume-yes-for-downloads --include-module=pydantic.deprecated.decorator test.py). Here is the full traceback after running the executable:

DEBUG:root:Validated DF WITHOUT any check
ERROR:root:Schema validation error: Error while executing check function: DispatchError("No matching functions found")
Traceback (most recent call last):
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\components.py", line 217, in run_checks
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\base.py", line 102, in run_check
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\checks.py", line 227, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\checks.py", line 296, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\multimethod\__init__.py", line 432, in __call__
multimethod.DispatchError: No matching functions found

Traceback (most recent call last):
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\test.py", line 71, in <module>
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\test.py", line 68, in main
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\test.py", line 61, in run_checks
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\pandas\container.py", line 375, in validate
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\pandas\container.py", line 404, in _validate
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\container.py", line 100, in validate
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\container.py", line 175, in run_checks_and_handle_errors
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\base\error_handler.py", line 54, in collect_error
pandera.errors.SchemaError: Error while executing check function: DispatchError("No matching functions found")
Traceback (most recent call last):
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\components.py", line 217, in run_checks
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\base.py", line 102, in run_check
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\api\checks.py", line 227, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\pandera\backends\pandas\checks.py", line 296, in __call__
  File "C:\Users\mgme2\AppData\Local\Temp\ONEFIL~1\multimethod\__init__.py", line 432, in __call__
multimethod.DispatchError: No matching functions found

@kayhayen
Copy link
Member

I added it to the issue template. There was no way it was related to your error, this is more of a principle that you had so far no way of knowing to adhere, except that it asks for minimizing option usage.

I looked at anti-bloat and removed dask usage for pandera, and now eliminate the need for the deprecated pedantic workaround, and then I will be able to see what this is about. In the end something like multimethod must be doing something incompatible.

@kayhayen kayhayen self-assigned this Jul 31, 2024
@kayhayen kayhayen added the bug label Jul 31, 2024
@kayhayen
Copy link
Member

class overload(dict):
    """Ordered functions which dispatch based on their annotated predicates."""

    pending: set
    __get__ = multimethod.__get__

    def __new__(cls, func):
        namespace = inspect.currentframe().f_back.f_locals
        self = functools.update_wrapper(super().__new__(cls), func)
        self.pending = set()
        return namespace.get(func.__name__, self)

This decorator seems to want to do something funny. It seems to be getting the function name and tries to find the previous name, which makes sense. However, the frame locals for Nuitka are not updated for functions, and then it could easily be default to self such that the overload cannot work.

It seems it will either need help or be recognize as one, where the frame locals could be updated before its call. Nuitka does it only when an exception occurs. Making the larger change and having readable frame variables all the time, because that's where they are stored, is not going to happen quickly, but this should be a nice possibility to just make multimethod.overload a hard import and generate code for its call differently.

@kayhayen kayhayen added this to the 2.5 milestone Jul 31, 2024
@kayhayen kayhayen added the help wanted Please help with this, we think you can label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Please help with this, we think you can
Projects
None yet
Development

No branches or pull requests

2 participants