Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.3.0
Description
The partitioned aggregation node always repartitions spilled partitions. This often doesn't make sense, because if only a small number of partitions were spilled, it's likely that a single partition will fit easily in memory. Instead it should check to see if the partition is likely to fit in memory and if so, just pin aggregated_row_stream, rebuild the hash table, and reprocess unaggregated_row_stream. The partitioned hash join node already does the equivalent thing.
Changing this would improve performance for spilled aggregations by avoiding unnecessary repartitioning. It would also solve a corner case where Impala gives up on repartitioning despite the partition fitting in memory (see IMPALA-2676).
There is a TODO in the code in PartitionedAggregationNode::NextPartition() but creating a JIRA to track the issue.
Attachments
Issue Links
- breaks
-
IMPALA-5788 Spilling aggregation crashes when grouping by nondeterministic expression
- Resolved
- is duplicated by
-
IMPALA-5161 TPC-DS Q78 with MEM_LIMIT=10GB fails with "Repartitioning did not reduce the size of a spilled partition" on BufferPool dev branch
- Resolved
- is related to
-
IMPALA-5691 test_low_mem_limit_q18 is flaky
- Resolved