Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.1
Description
I would generally expect that x = y => hash( x ) = hash( y ). However +-0 hash to different values for floating point types.
scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show +-------------------------+--------------------------+ |hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))| +-------------------------+--------------------------+ | -1670924195| -853646085| +-------------------------+--------------------------+ scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show +--------------------------------------------+ |(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))| +--------------------------------------------+ | true| +--------------------------------------------+
I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0.