Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6661

Group by float results in one group per NaN value

    XMLWordPrintableJSON

Details

    • ghx-label-9

    Description

      I don't know if this is the desired behaviour but it could be problematic for some users since it will blow up the number of distinct groups in an aggregation. I suspect that it's more useful to coalesce all the NaNs into a single group, similar to how NULL is handled in GROUP BY.

      [localhost:21000] > select distinct * from (values(cast("nan" as float)), (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
      +----------------------+
      | cast('nan' as float) |
      +----------------------+
      | NaN                  |
      | NaN                  |
      | NaN                  |
      +----------------------+
      Fetched 3 row(s) in 0.11s
      

      I suspect IMPALA-6069 slightly changed the behaviour here, although it would have been broken beforehand anyway, since not all NaNs have the same bit pattern, so Equals() and Hash() were inconsistent.

      We should decided what the preferred behaviour is and tweak the behaviour of the hash table to produce it.

      Attachments

        Issue Links

          Activity

            People

              mostrows Michal Ostrowski
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: