[HIVE-931] Optimize GROUP BY aggregations where key is a sorted/bucketed column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: Query Processor
Labels:
None

Hadoop Flags:

Reviewed

Description

If the table is sorted by a given key, we don't use that for group by. That can be very useful.

For eg: if T is sorted by column c1,

For select c1, aggr() from T group by c1
we always use a single map-reduce job. No hash table is needed on the mapper, since the data is sorted by c1 anyway.

This will reduce the memory pressure on the mapper and also remove overhead of maintaining the hash table.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hive-931-2009-11-18.patch
19/Nov/09 06:48
310 kB
He Yongqiang
hive-931-2009-11-19.patch
20/Nov/09 03:30
326 kB
He Yongqiang
hive-931-2009-11-20.3.patch
20/Nov/09 20:25
416 kB
He Yongqiang
hive-931-2009-11-21.patch
22/Nov/09 07:02
416 kB
He Yongqiang
hive-931-2009-12-01.patch
02/Dec/09 06:03
438 kB
He Yongqiang
hive-931-2009-12-03.patch
03/Dec/09 23:21
449 kB
He Yongqiang

Activity

People

Assignee:: He Yongqiang

Reporter:: Namit Jain

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Nov/09 20:21

Updated:: 17/Dec/11 00:05

Resolved:: 04/Dec/09 04:39