[IMPALA-7204] Add support for GROUP BY ROLLUP, CUBE and GROUPING SETS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: Impala 3.0, Impala 2.12.0
Fix Version/s: None
Component/s: Backend
Labels:
- sql-language
- tpc-ds

Target Version:

Product Backlog
Epic Color:
ghx-label-8

Description

Now suppose that we'd like to analyze our sales data, to study the amount of sales that is occurring for different products, in different states and regions. Using the ROLLUP feature of SQL 2003, we could issue the query:

select region, state, product, sum(sales) total_sales
from sales_history 
group by rollup (region, state, product)

Semantically, the above query is equivalent to

select region, state, product, sum(sales) total_sales
from sales_history 
group by region, state, product
union
select region, state, null, sum(sales) total_sales
from sales_history 
group by region, state
union
select region, null, null, sum(sales) total_sales
from sales_history 
group by region
union
select null, null, null, sum(sales) total_sales
from sales_history

The query might produce results that looked something like:

REGION STATE PRODUCT TOTAL_SALES
------ ----- ------- -----------
null null null 6200
EAST MA BOATS 100
EAST MA CARS 1500
EAST MA null 1600
EAST NY BOATS 150
EAST NY CARS 1000
EAST NY null 1150
EAST null null 2750
WEST CA BOATS 750
WEST CA CARS 500
WEST CA null 1250
WEST AZ BOATS 2000
WEST AZ CARS 200
WEST AZ null 2200
WEST null null 3450

We have a lot of production queries that work around this missing Impala functionality by having three UNION ALLs. Physical execution plan shows Impala actually reads full fact table three times. So it could be a three times improvement (or more, depending on number of columns that are being rolled up).

I can't find another SQL on Hadoop engine that doesn't support this feature.
Checked Spark, Hive, PIG, Flink and some other engines - they all do support this basic SQL feature.

Would be great to have a matching feature in Impala too.

Attachments

Issue Links

depends upon

IMPALA-110 Add support for multiple distinct operators in the same query block

Resolved

is duplicated by

IMPALA-9298 Support Cube, Grouping Sets and Rollup to enhanced group by clause

Resolved

relates to

FLINK-5303 Add CUBE/ROLLUP/GROUPING SETS operator in SQL

Resolved

HIVE-2397 Support with rollup option for group by

Resolved

SPARK-2663 Support the GroupingSet/ROLLUP/CUBE

Resolved

PIG-4066 An optimization for ROLLUP operation in Pig

Closed

(1 relates to)

Sub-Tasks

1.	Parser support for ROLLUP, CUBE and GROUPING SETS	Resolved	Tim Armstrong
2.	Plan generation and execution for ROLLUP, CUBE and GROUPING SETS	Resolved	Aman Sinha
3.	Support distinct aggregates and grouping sets in the same query block	Open	Unassigned
4.	Support grouping() and grouping_id() functions	Resolved	Tim Armstrong

Activity

People

Assignee:: Unassigned

Reporter:: Ruslan Dautkhanov

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Jun/18 16:42

Updated:: 25/Jun/20 22:46