[FLINK-12161] Supports partial-final optimization for stream group aggregate - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.9.0
Component/s: Table SQL / Planner
Labels:
- pull-request-available

Description

To resolve data-skew for distinct aggregates on stream, we introduce a rule named SplitAggregateRule which rewrites an aggregate query with distinct aggregations into an expanded double aggregations. The first aggregation compute the results in sub-partition(with bucket) and the results are combined by the second aggregation.

if two-stage aggregation is also enabled, we find that many plans have common pattern, looks like:

... (output)
StreamExecGlobalGroupAggregate (final global agg)
+- StreamExecExchange
     +- StreamExecLocalGroupAggregate (final local agg)
          +- StreamExecGlobalGroupAggregate (partial global agg)
               +- .... (input)

There is no exchange between the final local aggregate and the partial global aggregate, so they will be executed in a same JobVertex, and could share state. We introduce a rule named IncrementalAggregateRule to do that optimization.

Attachments

Issue Links

is a child of

FLINK-11488 Add a basic Blink planner framework

Closed

links to

GitHub Pull Request #8148

Activity

People

Assignee:: godfrey he

Reporter:: godfrey he

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Apr/19 07:17

Updated:: 13/Apr/19 12:04

Resolved:: 13/Apr/19 12:04

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m