Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
For the query as following:
create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1;
If the column for group by has many different values (for example 400000) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query.
The same query can finish in 7 seconds, if I turn off map side aggregation by:
set hive.map.aggr = false;
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-9495 Map Side aggregation affecting map performance
- Resolved
-
HIVE-12093 launch local task to process map join cost long time
- Resolved
- is related to
-
HADOOP-12217 hashCode in DoubleWritable returns same value for many numbers
- Patch Available
- relates to
-
HIVE-11761 DoubleWritable hashcode for GroupBy is not properly generated
- Closed