Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 3.0, Impala 2.12.0
-
None
-
ghx-label-8
Description
Now suppose that we'd like to analyze our sales data, to study the amount of sales that is occurring for different products, in different states and regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
select region, state, product, sum(sales) total_sales from sales_history group by rollup (region, state, product)
Semantically, the above query is equivalent to
select region, state, product, sum(sales) total_sales from sales_history group by region, state, product union select region, state, null, sum(sales) total_sales from sales_history group by region, state union select region, null, null, sum(sales) total_sales from sales_history group by region union select null, null, null, sum(sales) total_sales from sales_history
The query might produce results that looked something like:
REGION STATE PRODUCT TOTAL_SALES ------ ----- ------- ----------- null null null 6200 EAST MA BOATS 100 EAST MA CARS 1500 EAST MA null 1600 EAST NY BOATS 150 EAST NY CARS 1000 EAST NY null 1150 EAST null null 2750 WEST CA BOATS 750 WEST CA CARS 500 WEST CA null 1250 WEST AZ BOATS 2000 WEST AZ CARS 200 WEST AZ null 2200 WEST null null 3450
We have a lot of production queries that work around this missing Impala functionality by having three UNION ALLs. Physical execution plan shows Impala actually reads full fact table three times. So it could be a three times improvement (or more, depending on number of columns that are being rolled up).
I can't find another SQL on Hadoop engine that doesn't support this feature.
Checked Spark, Hive, PIG, Flink and some other engines - they all do support this basic SQL feature.
Would be great to have a matching feature in Impala too.
Attachments
Issue Links
- depends upon
-
IMPALA-110 Add support for multiple distinct operators in the same query block
-
- Resolved
-
- is duplicated by
-
IMPALA-9298 Support Cube, Grouping Sets and Rollup to enhanced group by clause
-
- Resolved
-
- relates to
-
FLINK-5303 Add CUBE/ROLLUP/GROUPING SETS operator in SQL
-
- Resolved
-
-
HIVE-2397 Support with rollup option for group by
-
- Resolved
-
-
SPARK-2663 Support the GroupingSet/ROLLUP/CUBE
-
- Resolved
-
-
PIG-4066 An optimization for ROLLUP operation in Pig
-
- Closed
-
1.
|
Support distinct aggregates and grouping sets in the same query block |
|
Open | Unassigned |