Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27778

toPandas with arrow enabled fails for DF with no partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      Calling to pandas with spark.sql.execution.arrow.enabled: true fails for dataframes with no partitions. The error is a EOFError. With spark.sql.execution.arrow.enabled: false the conversion.

      Repro (on current master branch):

      >>> from pyspark.sql.types import *
      >>> schema = StructType([StructField("field1", StringType(), True)])
      >>> df = spark.createDataFrame(sc.emptyRDD(), schema)
      >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
      >>> df.toPandas()
      /Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py:2162: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on failures in the middle of computation.
      
        warnings.warn(msg)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 2143, in toPandas
          batches = self._collectAsArrow()
        File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 2205, in _collectAsArrow
          results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
        File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 210, in load_stream
          num = read_int(stream)
        File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 810, in read_int
          raise EOFError
      EOFError
      >>> spark.conf.set("spark.sql.execution.arrow.enabled", "false")
      >>> df.toPandas()
      Empty DataFrame
      Columns: [field1]
      Index: []
      

      Attachments

        Issue Links

          Activity

            People

              dvogelbacher David Vogelbacher
              dvogelbacher David Vogelbacher
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: