Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7144

Optimize multiple LogicalAggregate into one

    XMLWordPrintableJSON

Details

    Description

      When applying multiple GROUP BY, and no aggregates or expression in the first GROUP BY, and the second GROUP fields is a subset of first GROUP fields. Then the first GROUP BY can be removed.

      Such as the following SQL ,

      SELECT a FROM (SELECT a,b,c FROM MyTable GROUP BY a, b, c) GROUP BY a
      

      should be optimized into

      DataStreamGroupAggregate(groupBy=[a], select=[a])
      DataStreamCalc(select=[a])
      DataStreamScan(table=[[_DataStreamTable_0]])
      

      but get:

      DataStreamGroupAggregate(groupBy=[a], select=[a])
      DataStreamCalc(select=[a])
      DataStreamGroupAggregate(groupBy=[a, b, c], select=[a, b, c])
      DataStreamScan(table=[[_DataStreamTable_0]])
      

      I looked for the Calcite built-in rules, but can't find a match one. So maybe we should implement one , and maybe we should implement it in Calcite.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jark Jark Wu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: