Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.11.0, Impala 2.12.0
-
ghx-label-9
Description
I don't know if this is the desired behaviour but it could be problematic for some users since it will blow up the number of distinct groups in an aggregation. I suspect that it's more useful to coalesce all the NaNs into a single group, similar to how NULL is handled in GROUP BY.
[localhost:21000] > select distinct * from (values(cast("nan" as float)), (cast("nan" as float)), (sqrt(cast("-1" as float)))) v; +----------------------+ | cast('nan' as float) | +----------------------+ | NaN | | NaN | | NaN | +----------------------+ Fetched 3 row(s) in 0.11s
I suspect IMPALA-6069 slightly changed the behaviour here, although it would have been broken beforehand anyway, since not all NaNs have the same bit pattern, so Equals() and Hash() were inconsistent.
We should decided what the preferred behaviour is and tweak the behaviour of the hash table to produce it.
Attachments
Issue Links
- relates to
-
IMPALA-6069 Incorrect handling of Nan with join and codegen
-
- Resolved
-
-
IMPALA-6660 -0/+0 floating point do not compare as equal in hash table
-
- Resolved
-
-
IMPALA-1543 Positive and negative zero floats hash and compare as unequal, although they should be equal.
-
- Resolved
-