Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1706

DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Enabling DruidAggregateFilterTransposeRule may cause very fine-grained aggregations to be pushed to Druid.

      Running DruidAdapterIT.testFilterTimestamp, here is the previous plan (with DruidAggregateFilterTransposeRule disabled):

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[COUNT()])
          BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
      

      Here is the (in my opinion inferior) plan with DruidAggregateFilterTransposeRule enabled:

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[$SUM0($1)])
          BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 6)))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], aggs=[[COUNT()]])
      

      Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp is very high cardinality, is this an efficient operation for Druid?

      For this particular query, the ideal would be to push the filter into the intervals clause. Then we would not need to group by __timestamp. I am not sure why this is not happening.

      Nishant Bangarwa, slim bouguerra, How bad is the query with DruidAggregateFilterTransposeRule enabled, in your opinion? Is this a show-stopper for Calcite 1.12?

        Issue Links

          Activity

          Hide
          bslim slim bouguerra added a comment -

          Julian Hyde i don't think it is a show stopper. Druid can handle grouping on timestamp and do well (comparing to pull all rows and do it elsewhere) in some cases. Of course we can always find a corner case where this can be expensive but i general it is better to do the GBY within druid it self.

          Show
          bslim slim bouguerra added a comment - Julian Hyde i don't think it is a show stopper. Druid can handle grouping on timestamp and do well (comparing to pull all rows and do it elsewhere) in some cases. Of course we can always find a corner case where this can be expensive but i general it is better to do the GBY within druid it self.
          Hide
          julianhyde Julian Hyde added a comment -

          slim bouguerra, Thanks. I disabled DruidAggregateFilterTransposeRule for now, and that returns us to the previous behavior, which was not too bad. With the rule enabled, the test was failing due to CALCITE-1436 (this wouldn't happen in Hive, but still, a failing test reduces coverage).

          We should consider re-enabling the rule when we have your fix for CALCITE-1707.

          Show
          julianhyde Julian Hyde added a comment - slim bouguerra , Thanks. I disabled DruidAggregateFilterTransposeRule for now, and that returns us to the previous behavior, which was not too bad. With the rule enabled, the test was failing due to CALCITE-1436 (this wouldn't happen in Hive, but still, a failing test reduces coverage). We should consider re-enabling the rule when we have your fix for CALCITE-1707 .
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed (by which I mean I disabled the rule) in http://git-wip-us.apache.org/repos/asf/calcite/commit/6b54b6ec.

          Show
          julianhyde Julian Hyde added a comment - Fixed (by which I mean I disabled the rule) in http://git-wip-us.apache.org/repos/asf/calcite/commit/6b54b6ec .
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.12.0 (2017-03-24).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development