Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12161

Supports partial-final optimization for stream group aggregate

    XMLWordPrintableJSON

Details

    Description

      To resolve data-skew for distinct aggregates on stream, we introduce a rule named SplitAggregateRule which rewrites an aggregate query with distinct aggregations into an expanded double aggregations. The first aggregation compute the results in sub-partition(with bucket) and the results are combined by the second aggregation.

      if two-stage aggregation is also enabled, we find that many plans have common pattern, looks like:

      ... (output)
      StreamExecGlobalGroupAggregate (final global agg)
      +- StreamExecExchange
           +- StreamExecLocalGroupAggregate (final local agg)
                +- StreamExecGlobalGroupAggregate (partial global agg)
                     +- .... (input)
      

      There is no exchange between the final local aggregate and the partial global aggregate, so they will be executed in a same JobVertex, and could share state. We introduce a rule named IncrementalAggregateRule to do that optimization.

      Attachments

        Issue Links

          Activity

            People

              godfreyhe godfrey he
              godfreyhe godfrey he
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m