Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-370

Support GROUPING SETS in SQL and algebra

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      Support grouping sets in the relational algebra. This feature would add a new field, groupKeys, to aggregate rel that has a set of bit maps, each of which has a grouping key.

      For example, if 'group by x, y, z' is represented by

      AggregateRel(groupSet={0, 1, 2})
      

      then 'group by grouping sets (x, y, z), (y, z), ()' would be represented by

      AggregateRel(groupSet={0, 1, 2}, groupSets=[ {0, 1, 2}, {1, 2}, {} ])
      

      A regular group by, or a group by with a singleton grouping set, would have groupSets = null.

      If groupSets is present, the returned row type would include an indicator column for each grouping key.

        Activity

        Hide
        julianhyde Julian Hyde added a comment - - edited

        Started work in https://github.com/julianhyde/incubator-calcite/tree/calcite-370, building on the work for CALCITE-296.

        My plan is to generalize the Aggregate class to have a list of grouping sets. If you use simple GROUP BY x, y you will get just one grouping set as if you had written GROUP BY GROUPING SETS (x, y). The output row type will include indicator columns, one for each distinct grouping expression (see the GROUPING function, https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm#i1007434).

        public class Aggregate extends SingleRel {
          public final BitSet groupKey;
          public final List<BitSet> groupSets;
          public final GroupingType groupingType;
        
          enum GroupingType {
            SINGLE, // one grouping set
            ROLLUP, // roll up leading edge: (x, y, z), (x, y), (x), ()
            CUBE, // the full 2^n grouping sets
            OTHER // not one of the above
          }
        }
        

        The row type for select k0, k1, sum(c) as a0, sum(d) as a1, sum(e) as a2 from t group k0, k1 would be (k0, g0, k1, g1, a0, a1, a2). Note the indicator columns g0, g1. g0 evaluates GROUPING(k0), saying whether this row is a roll up over all g0 values.

        Existing rules will have to be changed to only fire if groupingType == SINGLE, and skip over the indicator columns.

        Show
        julianhyde Julian Hyde added a comment - - edited Started work in https://github.com/julianhyde/incubator-calcite/tree/calcite-370 , building on the work for CALCITE-296 . My plan is to generalize the Aggregate class to have a list of grouping sets. If you use simple GROUP BY x, y you will get just one grouping set as if you had written GROUP BY GROUPING SETS (x, y) . The output row type will include indicator columns, one for each distinct grouping expression (see the GROUPING function, https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm#i1007434 ). public class Aggregate extends SingleRel { public final BitSet groupKey; public final List<BitSet> groupSets; public final GroupingType groupingType; enum GroupingType { SINGLE, // one grouping set ROLLUP, // roll up leading edge: (x, y, z), (x, y), (x), () CUBE, // the full 2^n grouping sets OTHER // not one of the above } } The row type for select k0, k1, sum(c) as a0, sum(d) as a1, sum(e) as a2 from t group k0, k1 would be (k0, g0, k1, g1, a0, a1, a2) . Note the indicator columns g0, g1. g0 evaluates GROUPING(k0) , saying whether this row is a roll up over all g0 values. Existing rules will have to be changed to only fire if groupingType == SINGLE, and skip over the indicator columns.
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment -

        What is the purpose of GroupingType?

        How are you going to support GROUPING SETS ((X, Y), (Z), (Y), ())?
        What would be the value of groupingType?

        What is the preferred implementation approach?
        "implementations that cannot implement grouping sets throw HaveNoIdeaHowToImplement" and "some Calcite rule rewrites grouping sets to a set of unions and retries"? (e.g. something similar to avg(..))

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - What is the purpose of GroupingType ? How are you going to support GROUPING SETS ((X, Y), (Z), (Y), ()) ? What would be the value of groupingType ? What is the preferred implementation approach? "implementations that cannot implement grouping sets throw HaveNoIdeaHowToImplement" and "some Calcite rule rewrites grouping sets to a set of unions and retries"? (e.g. something similar to avg(..) )
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment -

        Note the indicator columns g0, g1. g0 evaluates GROUPING(k0)

        There is nice GROUPING_ID function.
        We might want to support that as well.

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - Note the indicator columns g0, g1. g0 evaluates GROUPING(k0) There is nice GROUPING_ID function. We might want to support that as well.
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/6d79b5ebf7b1fa40455eb3df82998c58a8e08b8d .
        Hide
        julianhyde Julian Hyde added a comment -

        Closing now that 1.0.0-incubating has been released.

        Show
        julianhyde Julian Hyde added a comment - Closing now that 1.0.0-incubating has been released.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            julianhyde Julian Hyde
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development