Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
HIVE-609. Optimize multi-group by. (Namit Jain via zshao)
Description
For query like:
from src
insert overwrite table dest1 select col1, count(distinct colx) group by col1
insert overwrite table dest2 select col2, count(distinct colx) group by col2;
If map side aggregation is turned off, we currently do 4 map-reduce jobs.
The plan can be optimized by running it in 3 map-reduce jobs, by spraying over the
distinct column first and then aggregating individual results.
This may not be possible if there are multiple distinct columns, but the above query is very common
in data warehousing environments.