Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20345

Adds an Expand node only if there are multiple distinct aggregate functions in an Aggregate when executes SplitAggregateRule

    XMLWordPrintableJSON

Details

    Description

      As mentioned in Flink Document, we could split distinct aggregation to solve skew data on distinct keys which is a very good optimization. However, an unnecessary `Expand` node will be generated under some special cases, like the following sql. 

      SELECT COUNT(c) AS pv, COUNT(DISTINCT c) AS uv FROM T GROUP BY a
      

      Which plan is like the following text, the Expand and filter condition in aggregate functions could be removed.

      Sink(name=[DataStreamTableSink], fields=[pv, uv])
      +- Calc(select=[pv, uv])
         +- GroupAggregate(groupBy=[a], partialFinalType=[FINAL], select=[a, $SUM0_RETRACT($f2_0) AS $f1, $SUM0_RETRACT($f3) AS $f2])
            +- Exchange(distribution=[hash[a]])
               +- GroupAggregate(groupBy=[a, $f2], partialFinalType=[PARTIAL], select=[a, $f2, COUNT(c) FILTER $g_1 AS $f2_0, COUNT(DISTINCT c) FILTER $g_0 AS $f3])
                  +- Exchange(distribution=[hash[a, $f2]])
                     +- Calc(select=[a, c, $f2, =($e, 1) AS $g_1, =($e, 0) AS $g_0])
                        +- Expand(projects=[{a=[$0], c=[$1], $f2=[$2], $e=[0]}, {a=[$0], c=[$1], $f2=[null], $e=[1]}])
                           +- Calc(select=[a, c, MOD(HASH_CODE(c), 1024) AS $f2])
                              +- MiniBatchAssigner(interval=[1000ms], mode=[ProcTime])
                                 +- DataStreamScan(table=[[default_catalog, default_database, T]], fields=[a, b, c])

      An `Expand` node is only necessary when multiple aggregate function with different distinct keys appear in an Aggregate.

      Attachments

        Issue Links

          Activity

            People

              jingzhang Jing Zhang
              jingzhang Jing Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: