Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20153

Count and Sum UDF consume more memory in Hive 2+

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.2
    • Fix Version/s: 4.0.0
    • Component/s: UDF
    • Labels:
      None

      Description

      While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations run out of memory on Hadoop side where they worked before in Hive1. 

      In many queries, we have to double the Mapper Memory settings (in our particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it makes it not so easy to upgrade to Hive 2.

      Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window functions.

        Attachments

          Activity

            People

            • Assignee:
              aihuaxu Aihua Xu
              Reporter:
              szehon Szehon Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: