[IMPALA-2708] Partitioned aggregation node repartitions when spilled partition could fit in memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 2.3.0
Fix Version/s: Impala 2.10.0
Component/s: Backend
Labels:
- performance
- resource-management

Target Version:

Product Backlog

Description

The partitioned aggregation node always repartitions spilled partitions. This often doesn't make sense, because if only a small number of partitions were spilled, it's likely that a single partition will fit easily in memory. Instead it should check to see if the partition is likely to fit in memory and if so, just pin aggregated_row_stream, rebuild the hash table, and reprocess unaggregated_row_stream. The partitioned hash join node already does the equivalent thing.

Changing this would improve performance for spilled aggregations by avoiding unnecessary repartitioning. It would also solve a corner case where Impala gives up on repartitioning despite the partition fitting in memory (see ~~IMPALA-2676~~).

There is a TODO in the code in PartitionedAggregationNode::NextPartition() but creating a JIRA to track the issue.

Attachments

Issue Links

breaks

IMPALA-5788 Spilling aggregation crashes when grouping by nondeterministic expression

Resolved

is duplicated by

IMPALA-5161 TPC-DS Q78 with MEM_LIMIT=10GB fails with "Repartitioning did not reduce the size of a spilled partition" on BufferPool dev branch

Resolved

is related to

IMPALA-5691 test_low_mem_limit_q18 is flaky

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Nov/15 18:08

Updated:: 10/Aug/17 23:54

Resolved:: 05/Aug/17 03:21