Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25156

Same query returns different result

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.1.1
    • None
    • Spark Core
      • Spark Version: 2.1.1
      • Java Version: Java 7
      • Scala Version: 2.11.8

    Description

      I performed two joins and two left outer join on five tables.

      There are several different results when you run the same query multiple times.

      Table A
       

      Column a Column b Column c Column d
      Long(nullable: false) Integer(nullable: false) String(nullable: true) String(nullable: false)

      Table B

      Column a Column b
      Long(nullable: false) String(nullable: false)

      Table C

      Column a Column b
      Integer(nullable: false) String(nullable: false)

      Table D

      Column a Column b Column c
      Long(nullable: true) Long(nullable: false) Integer(nullable: false)

      Table E

      Column a Column b Column c
      Long(nullable: false) Integer(nullable: false) String

      Query(Spark SQL)

      select A.c, B.b, C.b, D.c, E.c
      inner join B on A.a = B.a
      inner join C on A.b = C.a
      left outer join D on A.d <=> cast(D.a as string)
      left outer join E on D.b = E.a and D.c = E.b

       

      I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)

       

      + I execute 

      sql("set spark.sql.shuffle.partitions=801")

      before execute query.

      A, B Table has lot of rows but C Table has small dataset, so when i saw physical plan, A<> B join performed with SortMergeJoin and (A,B) <> C join performed with Broadcast hash join.

       

      And now, i removed set spark.sql.shuffle.partitions statement, it works fine.

      Is this spark sql's bug?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            leeyh0216 Yonghwan Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment