Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14133 Improve batch sql and hive integrate performance milestone-1
  3. FLINK-14874

add local aggregate to solve data skew for ROLLUP/CUBE case

    XMLWordPrintableJSON

Details

    Description

      Many tpc-ds queries have rollup keyword, which will be translated to multiple groups.
      for example: group by rollup (channel, id) is equivalent group by (channel, id) + group by (channel) + group by ().
      All data on empty group will be shuffled to a single node, It is a typical data skew case. If there is a local aggregate, the data size shuffled to the single node will be greatly reduced. However, currently the cost mode can't estimate the local aggregate's cost, and the plan with local aggregate may be chose even the query has rollup keyword.
      we could add a rule based phase (after physical phase) to enforce local aggregate if it's input has empty group.

      Attachments

        Issue Links

          Activity

            People

              godfreyhe godfrey he
              godfreyhe godfrey he
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m