[SPARK-31186] toPandas fails on simple query (collect() works) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.4
Fix Version/s: 2.4.6, 3.0.0
Component/s: PySpark
Labels:
None

Description

My pandas is 0.25.1.

I ran the following simple code (cross joins are enabled):

spark.sql('''
select t1.*, t2.* from (
  select explode(sequence(1, 3)) v
) t1 left join (
  select explode(sequence(1, 3)) v
) t2
''').toPandas()

and got a ValueError from pandas:

> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Collect works fine:

spark.sql('''
select * from (
  select explode(sequence(1, 3)) v
) t1 left join (
  select explode(sequence(1, 3)) v
) t2
''').collect()
# [Row(v=1, v=1),
#  Row(v=1, v=2),
#  Row(v=1, v=3),
#  Row(v=2, v=1),
#  Row(v=2, v=2),
#  Row(v=2, v=3),
#  Row(v=3, v=1),
#  Row(v=3, v=2),
#  Row(v=3, v=3)]

I imagine it's related to the duplicate column names, but this doesn't fail:

spark.sql("select 1 v, 1 v").toPandas()
# v	v
# 0	1	1

Also no issue for multiple rows:

spark.sql("select 1 v, 1 v union all select 1 v, 2 v").toPandas()

It also works when not using a cross join but a janky programatically-generated union all query:

cond = []
for ii in range(3):
    for jj in range(3):
        cond.append(f'select {ii+1} v, {jj+1} v')
spark.sql(' union all '.join(cond)).toPandas()

As near as I can tell, the output is identical to the explode output, making this issue all the more peculiar, as I thought toPandas() is applied to the output of collect(), so if collect() gives the same output, how can toPandas() fail in one case and not the other? Further, the lazy DataFrame is the same: DataFrame[v: int, v: int] in both cases. I must be missing something.

Attachments

Issue Links

links to

GitHub Pull Request #28025

GitHub Pull Request #28219

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Michael Chirico

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Mar/20 05:02

Updated:: 12/Dec/22 18:11

Resolved:: 27/Mar/20 03:12