Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26021

-0.0 and 0.0 not treated consistently, doesn't match Hive

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
    • Target Version/s:

      Description

      Per Alon Doron and Matthew Cheah and SPARK-24834, I'm splitting this out as a new issue:

      The underlying issue is how Spark and Hive treat 0.0 and -0.0, which are numerically identical but not the same double value:

      In hive, 0.0 and -0.0 are equal since https://issues.apache.org/jira/browse/HIVE-11174.
      That's not the case with spark sql as "group by" (non-codegen) treats them as different values. Since their hash is different they're put in different buckets of UnsafeFixedWidthAggregationMap.

      In addition there's an inconsistency when using the codegen, for example the following unit test:

      println(Seq(0.0d, 0.0d, -0.0d).toDF("i").groupBy("i").count().collect().mkString(", "))
      

      [0.0,3]

      println(Seq(0.0d, -0.0d, 0.0d).toDF("i").groupBy("i").count().collect().mkString(", "))
      

      [0.0,1], [-0.0,2]

      spark.conf.set("spark.sql.codegen.wholeStage", "false")
      println(Seq(0.0d, -0.0d, 0.0d).toDF("i").groupBy("i").count().collect().mkString(", "))
      

      [0.0,2], [-0.0,1]

      Note that the only difference between the first 2 lines is the order of the elements in the Seq.
      This inconsistency is resulted by different partitioning of the Seq and the usage of the generated fast hash map in the first, partial, aggregation.

      It looks like we need to add a specific check for -0.0 before hashing (both in codegen and non-codegen modes) if we want to fix this.

        Attachments

          Activity

            People

            • Assignee:
              adoron Alon Doron
              Reporter:
              srowen Sean R. Owen
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: