Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark 3.4 (without Connect) makes SE throw an error #114

Closed
dannymeijer opened this issue Nov 8, 2024 · 2 comments · Fixed by #115
Closed

[BUG] Spark 3.4 (without Connect) makes SE throw an error #114

dannymeijer opened this issue Nov 8, 2024 · 2 comments · Fixed by #115
Labels
bug Something isn't working

Comments

@dannymeijer
Copy link
Member

Describe the bug
Spark 3.4 (without Connect) makes SE throw an error.

self = SparkExpectations(product_id='test_product', rules_df=DataFrame[product_id: string, table_name: string, rule_type: str...appedDataFrameWriter object at 0x7f6af3f65840>, debugger=False, stats_streaming_options={'se.enable.streaming': False})

    def __post_init__(self) -> None:
        # Databricks runtime 14 and above could pass either instance of a Dataframe depending on how data was read
        if (
            is_spark_connect_supported is True
>           and isinstance(self.rules_df, (DataFrame, connectDataFrame))
        ) or (
            is_spark_connect_supported is False and isinstance(self.rules_df, DataFrame)
        ):
E       NameError: name 'connectDataFrame' is not defined

To Reproduce
Steps to reproduce the behavior:

  1. Python 3.9 or 3.10, running on Spark 3.4 without Spark Connect (Spark Remote) being installed
  2. Try to invoke SparkExpectations, e.g. like this here: https://github.com/Nike-Inc/koheesio/blob/6d6ccbd1167b78c16719d3df62bc03193ab0f5d2/src/koheesio/integrations/spark/dq/spark_expectations.py#L136C17-L146C19

Expected behavior
ConnectDataFrame should only be accessed if it is actually available. If one does not install pyspark with remote (pyspark[connect]), it will not be available

Reference Materials
For reference: https://github.com/Nike-Inc/koheesio/actions/runs/11744019496/job/32718167755?pr=97
Koheesio has a solve for this in upcoming 0.9 release: https://github.com/Nike-Inc/koheesio/blob/b37a302dbbd16a38e018559d8405009bb2131910/src/koheesio/spark/utils/common.py#L80-L151

Desktop (please complete the following information):
N/A

Additional context
...

@dannymeijer dannymeijer added the bug Something isn't working label Nov 8, 2024
dannymeijer added a commit that referenced this issue Nov 8, 2024
@dannymeijer dannymeijer linked a pull request Nov 8, 2024 that will close this issue
9 tasks
@dannymeijer dannymeijer mentioned this issue Nov 8, 2024
9 tasks
@stampthecoder
Copy link

Awesome! its finally here!

@asingamaneni asingamaneni changed the title [BUG] Please add your bug title here [BUG] Spark 3.4 (without Connect) makes SE throw an error Nov 8, 2024
@dannymeijer
Copy link
Member Author

Looking forward to see this released

asingamaneni pushed a commit that referenced this issue Nov 11, 2024
* fix for #114

* update prospector, pylint, makefile command

* revert

* loosen dependencies

* poetry lock

* additional coverage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants