Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23356

Hash aggregation is always disabled while processing querys with grouping sets expressions.

    XMLWordPrintableJSON

    Details

      Description

      Hi, we find that many MapReduce map attempts disable hash aggregation while processing GroupBy Operator with grouping sets expressions. Logs are just like this:
      `Disable Hash Aggr: #hash table = 104427 #total = 100000 reduction = 1.0 minReduction = 0.5`, which means hashTable size is even larger than the number of input rows.

      Dig deeper we can find that GroupBy operator disables hash aggregation just based on the number of input rows instead of rows actually processed. For GroupBy Operator with grouping sets (such as: roll up, cube, etc.), `numberRowsProcessed` equals `groupingSets.size()*numberRowsInput`, causing hashTableSize much larger than querys without grouping sets expressions.

      Let's consider a example below:

      • Query:
        SELECT col1, col2, MAX(col0) FROM tbl GROUP BY col1, col2 GROUPING SETS (col1,col2);
      • Input data:
        col0, col1, col2
        [0,-1,-1]
        [1,-1,-1]
        [2,-1,-1]
        [3,3,3]
        [4,4,4]
      • Configuration:
        hive.groupby.mapaggr.checkinterval = 1;
        minReductionHashAggr = 0.5;
      • Currently, there are 10 rows in partial aggregate result. (hash aggregation is disabled very early):
        [-1, null, 0]
        [null, -1, 0]
        [-1, null, 1]
        [null, -1, 1]
        [-1, null, 2]
        [null, -1, 2]
        [3, null, 3]
        [null, 3, 3]
        [4, null, 4]
        [null, 4, 4]

      The patch below disables hash aggregation based on the number of rows actually processed, reducing partial result entries a lot and improving the performance of aggregation. There are only 6 rows in partial aggregate result:

      [-1, null, 2]
      [null, -1, 2]
      [null, 3, 3]
      [4, null, 4]
      [null, 4, 4]
      [3, null, 3]

        Attachments

        1. HIVE-23356.01.patch
          10 kB
          Qiang.Kang
        2. HIVE-23356.02.patch
          10 kB
          Qiang.Kang
        3. HIVE-23356.03.patch
          9 kB
          Qiang.Kang

          Activity

            People

            • Assignee:
              Qiang.Kang Qiang.Kang
              Reporter:
              Qiang.Kang Qiang.Kang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: