Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43877

Fix behavior difference for compare binary functions.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • None
    • None

    Description

      In https://github.com/apache/spark/pull/41362, we add `result = result.fillna(False)` for filling the gap between pandas <> pandas API on Spark, but it should be internally fixed from Spark Connect side. Please refer to the reproducible code below:

       

      import pandas as pd
      import pyspark.pandas as ps
      from pyspark.sql.utils import pyspark_column_op
      
      pser = pd.Series([None, None, None])
      psser = ps.from_pandas(pser)
      pyspark_column_op("__ge__")(psser, psser)
      # Wrong result:
      #  0    None
      #  1    None
      #  2    None
      #  dtype: object
      
      # Expected result:
      pser > pser
      #  0    False
      #  1    False
      #  2    False
      dtype: bool

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: