You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Spark 3.4 (without Connect) makes SE throw an error.
self=SparkExpectations(product_id='test_product', rules_df=DataFrame[product_id: string, table_name: string, rule_type: str...appedDataFrameWriterobjectat0x7f6af3f65840>, debugger=False, stats_streaming_options={'se.enable.streaming': False})
def__post_init__(self) ->None:
# Databricks runtime 14 and above could pass either instance of a Dataframe depending on how data was readif (
is_spark_connect_supportedisTrue>andisinstance(self.rules_df, (DataFrame, connectDataFrame))
) or (
is_spark_connect_supportedisFalseandisinstance(self.rules_df, DataFrame)
):
ENameError: name'connectDataFrame'isnotdefined
To Reproduce
Steps to reproduce the behavior:
Python 3.9 or 3.10, running on Spark 3.4 without Spark Connect (Spark Remote) being installed
Expected behavior
ConnectDataFrame should only be accessed if it is actually available. If one does not install pyspark with remote (pyspark[connect]), it will not be available
Describe the bug
Spark 3.4 (without Connect) makes SE throw an error.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
ConnectDataFrame should only be accessed if it is actually available. If one does not install pyspark with remote (
pyspark[connect]
), it will not be availableReference Materials
For reference: https://github.com/Nike-Inc/koheesio/actions/runs/11744019496/job/32718167755?pr=97
Koheesio has a solve for this in upcoming 0.9 release: https://github.com/Nike-Inc/koheesio/blob/b37a302dbbd16a38e018559d8405009bb2131910/src/koheesio/spark/utils/common.py#L80-L151
Desktop (please complete the following information):
N/A
Additional context
...
The text was updated successfully, but these errors were encountered: