The partitioned aggregation node always repartitions spilled partitions. This often doesn't make sense, because if only a small number of partitions were spilled, it's likely that a single partition will fit easily in memory. Instead it should check to see if the partition is likely to fit in memory and if so, just pin aggregated_row_stream, rebuild the hash table, and reprocess unaggregated_row_stream. The partitioned hash join node already does the equivalent thing.
Changing this would improve performance for spilled aggregations by avoiding unnecessary repartitioning. It would also solve a corner case where Impala gives up on repartitioning despite the partition fitting in memory (see
There is a TODO in the code in PartitionedAggregationNode::NextPartition() but creating a JIRA to track the issue.