Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25156

Same query returns different result

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Question
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.1.1
    • None
    • Spark Core
      • Spark Version: 2.1.1
      • Java Version: Java 7
      • Scala Version: 2.11.8


      I performed two joins and two left outer join on five tables.

      There are several different results when you run the same query multiple times.

      Table A

      Column a Column b Column c Column d
      Long(nullable: false) Integer(nullable: false) String(nullable: true) String(nullable: false)

      Table B

      Column a Column b
      Long(nullable: false) String(nullable: false)

      Table C

      Column a Column b
      Integer(nullable: false) String(nullable: false)

      Table D

      Column a Column b Column c
      Long(nullable: true) Long(nullable: false) Integer(nullable: false)

      Table E

      Column a Column b Column c
      Long(nullable: false) Integer(nullable: false) String

      Query(Spark SQL)

      select A.c, B.b, C.b, D.c, E.c
      inner join B on A.a = B.a
      inner join C on A.b = C.a
      left outer join D on A.d <=> cast(D.a as string)
      left outer join E on D.b = E.a and D.c = E.b


      I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)


      + I execute 

      sql("set spark.sql.shuffle.partitions=801")

      before execute query.

      A, B Table has lot of rows but C Table has small dataset, so when i saw physical plan, A<> B join performed with SortMergeJoin and (A,B) <> C join performed with Broadcast hash join.


      And now, i removed set spark.sql.shuffle.partitions statement, it works fine.

      Is this spark sql's bug?


        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned
            leeyh0216 Yonghwan Lee
            0 Vote for this issue
            3 Start watching this issue




                Issue deployment