Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25156

Same query returns different result

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
    • Environment:
      • Spark Version: 2.1.1
      • Java Version: Java 7
      • Scala Version: 2.11.8

      Description

      I performed two joins and two left outer join on five tables.

      There are several different results when you run the same query multiple times.

      Table A
       

      Column a Column b Column c Column d
      Long(nullable: false) Integer(nullable: false) String(nullable: true) String(nullable: false)

      Table B

      Column a Column b
      Long(nullable: false) String(nullable: false)

      Table C

      Column a Column b
      Integer(nullable: false) String(nullable: false)

      Table D

      Column a Column b Column c
      Long(nullable: true) Long(nullable: false) Integer(nullable: false)

      Table E

      Column a Column b Column c
      Long(nullable: false) Integer(nullable: false) String

      Query(Spark SQL)

      select A.c, B.b, C.b, D.c, E.c
      inner join B on A.a = B.a
      inner join C on A.b = C.a
      left outer join D on A.d <=> cast(D.a as string)
      left outer join E on D.b = E.a and D.c = E.b

       

      I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)

       

      + I execute 

      sql("set spark.sql.shuffle.partitions=801")

      before execute query.

      A, B Table has lot of rows but C Table has small dataset, so when i saw physical plan, A<> B join performed with SortMergeJoin and (A,B) <> C join performed with Broadcast hash join.

       

      And now, i removed set spark.sql.shuffle.partitions statement, it works fine.

      Is this spark sql's bug?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                leeyh0216 Yonghwan Lee
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: