[HIVE-9495] Map Side aggregation affecting map performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.14.0
Fix Version/s: None
Component/s: Query Processor
Labels:
None
Environment:

RHEL 6.4
Hortonworks Hadoop 2.2

Description

When trying to run a simple aggregation query with hive.map.aggr=true, map tasks take a lot of time in Hive 0.14 as against with hive.map.aggr=false.

e.g.
Consider the query:

INSERT OVERWRITE TABLE lineitem_tgt_agg
select alias.a0 as a0,
 alias.a2 as a1,
 alias.a1 as a2,
 alias.a3 as a3,
 alias.a4 as a4
from (
 select alias.a0 as a0,
  SUM(alias.a1) as a1,
  SUM(alias.a2) as a2,
  SUM(alias.a3) as a3,
  SUM(alias.a4) as a4
 from (
  select lineitem_sf500.l_orderkey as a0,
   CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * (1 - lineitem_sf500.l_discount) * (1 + lineitem_sf500.l_tax) as double) as a1,
   lineitem_sf500.l_quantity as a2,
   CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_discount as double) as a3,
   CAST(lineitem_sf500.l_quantity * lineitem_sf500.l_extendedprice * lineitem_sf500.l_tax as double) as a4
  from lineitem_sf500
  ) alias
 group by alias.a0
 ) alias;

The above query was run with ~376GB of data / ~3billion records in the source.
It takes ~10 minutes with hive.map.aggr=false.
With map side aggregation set to true, the map tasks don't complete even after an hour.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-9495.1.patch.txt
12/Feb/15 07:50
43 kB
Navis Ryu
HIVE-9495.2.patch.txt
24/Feb/15 05:06
19 kB
Navis Ryu
profiler_screenshot.PNG
28/Jan/15 08:56
60 kB
Anand Sridharan

Issue Links

duplicates

HIVE-11502 Map side aggregation is extremely slow

Closed

Activity

People

Assignee:: Anand Sridharan

Reporter:: Anand Sridharan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 28/Jan/15 08:53

Updated:: 30/Oct/15 08:25

Resolved:: 30/Oct/15 08:25