- Ashutosh Chauhan realized that the implementation of GROUPING__ID in Hive was not returning values as specified by SQL standard and other execution engines.
After digging into this, I found out that the implementation was bogus, as internally it was changing between big-endian/little-endian representation of GROUPING__ID indistinctly, and in some cases conversions in both directions were cancelling each other.
In the documentation in https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup we can already find the problem, even if we did not spot it at first.
The following query: SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
will have the following results.
NULL NULL 0 6 1 NULL 1 2 1 NULL 3 1 1 1 3 1
Observe that value for GROUPING__ID in first row should be `3`, while for third and fourth rows, it should be `0`.