Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16102

Grouping sets do not conform to SQL standard

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.0, 2.2.0
    • 2.3.0
    • Operators, Parser
    • None

    Description

      1. ashutoshc realized that the implementation of GROUPING__ID in Hive was not returning values as specified by SQL standard and other execution engines.

      After digging into this, I found out that the implementation was bogus, as internally it was changing between big-endian/little-endian representation of GROUPING__ID indistinctly, and in some cases conversions in both directions were cancelling each other.

      In the documentation in https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup we can already find the problem, even if we did not spot it at first.

      The following query: SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
      will have the following results.

      NULL NULL 0 6
      1 NULL 1 2
      1 NULL 3 1
      1 1 3 1

      ...

      Observe that value for GROUPING__ID in first row should be `3`, while for third and fourth rows, it should be `0`.

      Attachments

        1. HIVE-16102.patch
          15 kB
          Jesus Camacho Rodriguez
        2. HIVE-16102.01.patch
          17 kB
          Jesus Camacho Rodriguez
        3. HIVE-16102.02.patch
          58 kB
          Jesus Camacho Rodriguez

        Issue Links

          Activity

            People

              jcamachorodriguez Jesus Camacho Rodriguez
              jcamachorodriguez Jesus Camacho Rodriguez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: