Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.2
-
None
Description
While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations run out of memory on Hadoop side where they worked before in Hive1.
In many queries, we have to double the Mapper Memory settings (in our particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it makes it not so easy to upgrade to Hive 2.
Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window functions.