Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-732

Implement multiple distinct-COUNT using GROUPING SETS

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      Currently if a query has COUNT(DISTINCT x) and COUNT(DISTINCT y) we compute the distinct counts separately and combine them using a join. The join isn't too expensive (because usually the GROUP BY has only a few keys) but we make multiple scans over the base table.

      I think we could translate multiple distinct-counts into a GROUPING SETS query (i.e. an Aggregate with more than one element in the groupSets field). If the underlying engine can evaluate that efficiently, then we have saved ourselves a join and several scans.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        For example,

        select "department_id", count(distinct "gender") as g, count(distinct "education_level") as e
        from foodmart."employee"
        group by "department_id"

        can be implemented using

        select "department_id",
          grouping("gender") as ig, "gender",
          grouping("education_level") as ie, "education_level"
        from foodmart."employee"
        group by "department_id", grouping sets ("gender", "education_level");
        Show
        julianhyde Julian Hyde added a comment - For example, select "department_id" , count(distinct "gender" ) as g, count(distinct "education_level" ) as e from foodmart. "employee" group by "department_id" can be implemented using select "department_id" , grouping( "gender" ) as ig, "gender" , grouping( "education_level" ) as ie, "education_level" from foodmart. "employee" group by "department_id" , grouping sets ( "gender" , "education_level" );
        Hide
        julianhyde Julian Hyde added a comment - - edited

        The query

        select "department_id" as d,
          count(distinct "gender", "education_level") as c0,
          count(distinct "gender") as c1,
          count(distinct "education_level") as c2
        from foodmart_clone."employee"
        group by "department_id";

        can be evaluated using an equivalent query that uses grouping sets:

        select d,
          count(case i when 0 then 1 else null end) as c0,
          count(case i when 1 then 1 else null end) as c1,
          count(case i when 2 then 1 else null end) as c2
        from (
          select "department_id" as d,
            grouping_id("gender", "education_level") as i,
           "gender" as g,
           "education_level" as e
          from foodmart_clone."employee"
          group by "department_id",
            grouping sets ("gender", "education_level", ("gender", "education_level")))
        group by d;
        Show
        julianhyde Julian Hyde added a comment - - edited The query select "department_id" as d, count(distinct "gender" , "education_level" ) as c0, count(distinct "gender" ) as c1, count(distinct "education_level" ) as c2 from foodmart_clone. "employee" group by "department_id" ; can be evaluated using an equivalent query that uses grouping sets: select d, count( case i when 0 then 1 else null end) as c0, count( case i when 1 then 1 else null end) as c1, count( case i when 2 then 1 else null end) as c2 from ( select "department_id" as d, grouping_id( "gender" , "education_level" ) as i, "gender" as g, "education_level" as e from foodmart_clone. "employee" group by "department_id" , grouping sets ( "gender" , "education_level" , ( "gender" , "education_level" ))) group by d;
        Show
        julianhyde Julian Hyde added a comment - Fix can be previewed at https://github.com/julianhyde/incubator-calcite/tree/732-grouping-sets-distinct-count .
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/3e50232b .
        Hide
        jnadeau Jacques Nadeau added a comment -

        Resolved in release 1.4.0-incubating (2015-08-23)

        Show
        jnadeau Jacques Nadeau added a comment - Resolved in release 1.4.0-incubating (2015-08-23)

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            julianhyde Julian Hyde
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development