[SPARK-25156] Same query returns different result - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.1.1
Fix Version/s: None
Component/s: Spark Core
Labels:
- Question
Environment:
- Spark Version: 2.1.1
- Java Version: Java 7
- Scala Version: 2.11.8

Description

I performed two joins and two left outer join on five tables.

There are several different results when you run the same query multiple times.

Table A

Column a	Column b	Column c	Column d
Long(nullable: false)	Integer(nullable: false)	String(nullable: true)	String(nullable: false)

Table B

Column a	Column b
Long(nullable: false)	String(nullable: false)

Table C

Column a	Column b
Integer(nullable: false)	String(nullable: false)

Table D

Column a	Column b	Column c
Long(nullable: true)	Long(nullable: false)	Integer(nullable: false)

Table E

Column a	Column b	Column c
Long(nullable: false)	Integer(nullable: false)	String

Query(Spark SQL)

select A.c, B.b, C.b, D.c, E.c
inner join B on A.a = B.a
inner join C on A.b = C.a
left outer join D on A.d <=> cast(D.a as string)
left outer join E on D.b = E.a and D.c = E.b

I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)

+ I execute

sql("set spark.sql.shuffle.partitions=801")

before execute query.

A, B Table has lot of rows but C Table has small dataset, so when i saw physical plan, A<~~> B join performed with SortMergeJoin and (A,B) <~~> C join performed with Broadcast hash join.

And now, i removed set spark.sql.shuffle.partitions statement, it works fine.

Is this spark sql's bug?

Attachments

Issue Links

duplicates

SPARK-23207 Shuffle+Repartition on an DataFrame could lead to incorrect answers

Resolved

SPARK-23243 Shuffle+Repartition on an RDD could lead to incorrect answers

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Yonghwan Lee

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Aug/18 06:24

Updated:: 20/Aug/18 02:21

Resolved:: 20/Aug/18 02:21