Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
To resolve data-skew for distinct aggregates on stream, we introduce a rule named SplitAggregateRule which rewrites an aggregate query with distinct aggregations into an expanded double aggregations. The first aggregation compute the results in sub-partition(with bucket) and the results are combined by the second aggregation.
if two-stage aggregation is also enabled, we find that many plans have common pattern, looks like:
... (output) StreamExecGlobalGroupAggregate (final global agg) +- StreamExecExchange +- StreamExecLocalGroupAggregate (final local agg) +- StreamExecGlobalGroupAggregate (partial global agg) +- .... (input)
There is no exchange between the final local aggregate and the partial global aggregate, so they will be executed in a same JobVertex, and could share state. We introduce a rule named IncrementalAggregateRule to do that optimization.
Attachments
Issue Links
- is a child of
-
FLINK-11488 Add a basic Blink planner framework
- Closed
- links to