Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28881

toPandas with Arrow should not return a DataFrame when the result size exceeds `spark.driver.maxResultSize`

    XMLWordPrintableJSON

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0, 2.4.1, 2.4.2, 2.4.3
    • Fix Version/s: 2.4.4, 3.0.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      ./bin/pyspark --conf spark.driver.maxResultSize=1m
      spark.conf.set("spark.sql.execution.arrow.enabled",True)
      spark.range(10000000).toPandas()
      

      The codes above returns an empty dataframe in Spark 2.4 but It should throw an exception as below:

      py4j.protocol.Py4JJavaError: An error occurred while calling o31.collectToPython.
      : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 1 tasks (3.0 MB) is bigger than spark.driver.maxResultSize (1024.0 KB)
      

      This is a regression between Spark 2.3 and 2.4.
      This JIRA targets to add a regression test.

      In Spark 2.4:

      ./bin/pyspark --conf spark.driver.maxResultSize=1m
      spark.conf.set("spark.sql.execution.arrow.enabled",True)
      spark.range(10000000).toPandas()
      
      Empty DataFrame
      Columns: [id]
      Index: []
      

      or it can return partial results:

      ./bin/pyspark --conf spark.driver.maxResultSize=1m
      spark.conf.set("spark.sql.execution.arrow.enabled", True)
      spark.range(0, 330000, 1, 100).toPandas()
      
      ...
      75897  75897
      75898  75898
      75899  75899
      
      [75900 rows x 1 columns]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                hyukjin.kwon Hyukjin Kwon
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: