[SPARK-35207] hash() and other hash builtins do not normalize negative zero - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.1
Fix Version/s: 3.2.0
Component/s: SQL
Labels:
- correctness

Description

I would generally expect that x = y => hash( x ) = hash( y ). However +-0 hash to different values for floating point types.

scala> spark.sql("select hash(cast('0.0' as double)), hash(cast('-0.0' as double))").show
+-------------------------+--------------------------+
|hash(CAST(0.0 AS DOUBLE))|hash(CAST(-0.0 AS DOUBLE))|
+-------------------------+--------------------------+
|              -1670924195|                -853646085|
+-------------------------+--------------------------+
scala> spark.sql("select cast('0.0' as double) == cast('-0.0' as double)").show
+--------------------------------------------+
|(CAST(0.0 AS DOUBLE) = CAST(-0.0 AS DOUBLE))|
+--------------------------------------------+
|                                        true|
+--------------------------------------------+

I'm not sure how likely this is to cause issues in practice, since only a limited number of calculations can produce -0 and joining or aggregating with floating point keys is a bad practice as a general rule, but I think it would be safer if we normalised -0.0 to +0.0.

Attachments

Issue Links

links to

[Github] Pull Request #32496 (planga82)

Activity

People

Assignee:: Pablo Langa Blanco

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Apr/21 21:47

Updated:: 14/May/21 04:42

Resolved:: 14/May/21 04:42