[SPARK-12556] Pyspark dataframe unionAll call accepts incorrect input - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.4.1
Fix Version/s: None
Component/s: PySpark
Labels:
None

Description

I actually encountered this problem with two dataframes that have 8 and 10 columns each. The below is a made up example that reproduces what I observed going wrong.

Consider the two dataframes:

df1:

---------------+

count

---------------+
---------------+

df2:

----------------------

new_count

count

----------------------

1	4	6
1	5	6
3	6	6
2	7	6

----------------------

The call:

df3 = df1.unionAll(df2)

returns successfully with df3 containing 2 cloumns. However, some columns now have swapped values (with other columns). Based on my previous experience I would say that df3's count column will actually be the new_count column.

I believe that this call should never complete successfully in the first place and should throw an exception instead.

Attachments

Issue Links

Is contained by

SPARK-9813 Incorrect UNION ALL behavior

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Aravind B

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Dec/15 13:08

Updated:: 29/Dec/15 13:21

Resolved:: 29/Dec/15 13:21