[SPARK-27778] toPandas with arrow enabled fails for DF with no partitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: PySpark, SQL
Labels:
None

Description

Calling to pandas with spark.sql.execution.arrow.enabled: true fails for dataframes with no partitions. The error is a EOFError. With spark.sql.execution.arrow.enabled: false the conversion.

Repro (on current master branch):

>>> from pyspark.sql.types import *
>>> schema = StructType([StructField("field1", StringType(), True)])
>>> df = spark.createDataFrame(sc.emptyRDD(), schema)
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> df.toPandas()
/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py:2162: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on failures in the middle of computation.

  warnings.warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 2143, in toPandas
    batches = self._collectAsArrow()
  File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 2205, in _collectAsArrow
    results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
  File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 210, in load_stream
    num = read_int(stream)
  File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 810, in read_int
    raise EOFError
EOFError
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "false")
>>> df.toPandas()
Empty DataFrame
Columns: [field1]
Index: []

Attachments

Issue Links

links to

GitHub Pull Request #24650

Activity

People

Assignee:: David Vogelbacher

Reporter:: David Vogelbacher

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/May/19 13:57

Updated:: 12/Dec/22 18:10

Resolved:: 22/May/19 04:23