Now suppose that we'd like to analyze our sales data, to study the amount of sales that is occurring for different products, in different states and regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
Semantically, the above query is equivalent to
The query might produce results that looked something like:
We have a lot of production queries that work around this missing Impala functionality by having three UNION ALLs. Physical execution plan shows Impala actually reads full fact table three times. So it could be a three times improvement (or more, depending on number of columns that are being rolled up).
I can't find another SQL on Hadoop engine that doesn't support this feature.
Checked Spark, Hive, PIG, Flink and some other engines - they all do support this basic SQL feature.
Would be great to have a matching feature in Impala too.
|Parser support for ROLLUP, CUBE and GROUPING SETS
|Plan generation and execution for ROLLUP, CUBE and GROUPING SETS
|Support distinct aggregates and grouping sets in the same query block
|Support grouping() and grouping_id() functions