[HIVE-20153] Count and Sum UDF consume more memory in Hive 2+ - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.2
Fix Version/s: 4.0.0-alpha-1
Component/s: UDF
Labels:
None

Description

While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations run out of memory on Hadoop side where they worked before in Hive1.

In many queries, we have to double the Mapper Memory settings (in our particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it makes it not so easy to upgrade to Hive 2.

Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window functions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-20153.1.patch
26/Jul/18 20:08
3 kB
Aihua Xu
Screen Shot 2018-07-12 at 6.41.28 PM.png
12/Jul/18 16:44
49 kB
Szehon Ho

Activity

People

Assignee:: Aihua Xu

Reporter:: Szehon Ho

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Jul/18 16:35

Updated:: 17/Nov/22 08:54

Resolved:: 27/Jul/18 20:50