Same query returns different result

      I performed two joins and two left outer join on five tables.

      There are several different results when you run the same query multiple times.

      Table A

      Column a Column b Column c Column d
      Long(nullable: false) Integer(nullable: false) String(nullable: true) String(nullable: false)

      Table B

      Column a Column b
      Long(nullable: false) String(nullable: false)

      Table C

      Column a Column b
      Integer(nullable: false) String(nullable: false)

      Table D

      Column a Column b Column c
      Long(nullable: true) Long(nullable: false) Integer(nullable: false)

      Table E

      Column a Column b Column c
      Long(nullable: false) Integer(nullable: false) String

      Query(Spark SQL)

      select A.c, B.b, C.b, D.c, E.c
      inner join B on A.a = B.a
      inner join C on A.b = C.a
      left outer join D on A.d <=> cast(D.a as string)
      left outer join E on D.b = E.a and D.c = E.b


      I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)


      + I execute 

      sql("set spark.sql.shuffle.partitions=801")

      before execute query.

      A, B Table has lot of rows but C Table has small dataset, so when i saw physical plan, A<> B join performed with SortMergeJoin and (A,B) <> C join performed with Broadcast hash join.


      And now, i removed set spark.sql.shuffle.partitions statement, it works fine.

      Is this spark sql's bug?


