Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35207

hash() and other hash builtins do not normalize negative zero

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1
    • 3.2.0
    • SQL

    Description

      I would generally expect that x = y => hash( x ) = hash( y ). However +-0 hash to different values for floating point types.

      scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show
      +-------------------------+--------------------------+
      |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
      +-------------------------+--------------------------+
      |              -1670924195|                -853646085|
      +-------------------------+--------------------------+
      scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
      +--------------------------------------------+
      |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
      +--------------------------------------------+
      |                                        true|
      +--------------------------------------------+
      

      I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0.

      Attachments

        Activity

          People

            planga82 Pablo Langa Blanco
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: