Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.5.0
-
None
-
None
Description
In https://github.com/apache/spark/pull/41362, we add `result = result.fillna(False)` for filling the gap between pandas <> pandas API on Spark, but it should be internally fixed from Spark Connect side. Please refer to the reproducible code below:
import pandas as pd import pyspark.pandas as ps from pyspark.sql.utils import pyspark_column_op pser = pd.Series([None, None, None]) psser = ps.from_pandas(pser) pyspark_column_op("__ge__")(psser, psser) # Wrong result: # 0 None # 1 None # 2 None # dtype: object # Expected result: pser > pser # 0 False # 1 False # 2 False dtype: bool