[HIVE-609] optimize multi-group by - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.4.0
Component/s: Query Processor
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
~~HIVE-609~~. Optimize multi-group by. (Namit Jain via zshao)

Description

For query like:

from src
insert overwrite table dest1 select col1, count(distinct colx) group by col1
insert overwrite table dest2 select col2, count(distinct colx) group by col2;

If map side aggregation is turned off, we currently do 4 map-reduce jobs.
The plan can be optimized by running it in 3 map-reduce jobs, by spraying over the
distinct column first and then aggregating individual results.

This may not be possible if there are multiple distinct columns, but the above query is very common
in data warehousing environments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hive.609.1.patch
17/Jul/09 05:32
90 kB
Namit Jain
hive.609.2.patch
17/Jul/09 19:51
92 kB
Namit Jain
hive.609.3.patch
18/Jul/09 01:32
82 kB
Namit Jain
hive.609.4.patch
19/Jul/09 15:20
41 kB
Namit Jain
hive.609.5.patch
20/Jul/09 17:03
2 kB
Namit Jain
hive.609.6.patch
20/Jul/09 17:10
85 kB
Namit Jain
hive.609.7.patch
20/Jul/09 20:30
95 kB
Namit Jain
hive.609.10.patch
21/Jul/09 01:04
2 kB
Namit Jain
hive.609.11.patch
21/Jul/09 01:34
115 kB
Namit Jain

Issue Links

is depended upon by

HIVE-3728 make optimizing multi-group by configurable

Closed

is related to

HIVE-878 Update the hash table entry before flushing in Group By hash aggregation

Closed

Activity

People

Assignee:: Namit Jain

Reporter:: Namit Jain

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Jul/09 19:58

Updated:: 21/Nov/12 04:45

Resolved:: 21/Jul/09 03:18