[SPARK-43877] Fix behavior difference for compare binary functions. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.5.0
Fix Version/s: None
Component/s: Pandas API on Spark, PySpark
Labels:
None

Epic Link:
Spark Connect

Description

In https://github.com/apache/spark/pull/41362, we add `result = result.fillna(False)` for filling the gap between pandas <> pandas API on Spark, but it should be internally fixed from Spark Connect side. Please refer to the reproducible code below:

import pandas as pd
import pyspark.pandas as ps
from pyspark.sql.utils import pyspark_column_op

pser = pd.Series([None, None, None])
psser = ps.from_pandas(pser)
pyspark_column_op("__ge__")(psser, psser)
# Wrong result:
#  0    None
#  1    None
#  2    None
#  dtype: object

# Expected result:
pser > pser
#  0    False
#  1    False
#  2    False
dtype: bool

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Haejoon Lee

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 30/May/23 05:49

Updated:: 22/Sep/23 06:17