[HADOOP-4139] [Hive] multi group by statement is not optimized - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.19.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

A simple multi-group by statement is not optimized. A simple statement like:

FROM SRC
INSERT OVERWRITE TABLE DEST1 SELECT SRC.key, count(distinct SUBSTR(SRC.value,4)) GROUP BY SRC.key
INSERT OVERWRITE TABLE DEST2 SELECT SRC.key, count(distinct SUBSTR(SRC.value,4)) GROUP BY SRC.key;

results in making 2 copies of the data (SRC). Instead, the data can be first partially aggregated on the distinct value and then aggregated.
The first step can be common to all group bys.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

patch1
09/Sep/08 23:59
28 kB
Namit Jain
patch3
11/Sep/08 01:27
29 kB
Namit Jain
patch4.txt
13/Sep/08 00:35
30 kB
Namit Jain

Activity

People

Assignee:: Namit Jain

Reporter:: Namit Jain

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 09/Sep/08 23:46

Updated:: 08/Jul/09 17:06

Resolved:: 17/Sep/08 00:28