Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3200 Replace BufferedBlockMgr with new buffer pool
  3. IMPALA-2708

Partitioned aggregation node repartitions when spilled partition could fit in memory

    XMLWordPrintableJSON

    Details

      Description

      The partitioned aggregation node always repartitions spilled partitions. This often doesn't make sense, because if only a small number of partitions were spilled, it's likely that a single partition will fit easily in memory. Instead it should check to see if the partition is likely to fit in memory and if so, just pin aggregated_row_stream, rebuild the hash table, and reprocess unaggregated_row_stream. The partitioned hash join node already does the equivalent thing.

      Changing this would improve performance for spilled aggregations by avoiding unnecessary repartitioning. It would also solve a corner case where Impala gives up on repartitioning despite the partition fitting in memory (see IMPALA-2676).

      There is a TODO in the code in PartitionedAggregationNode::NextPartition() but creating a JIRA to track the issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: