Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1706

DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Enabling DruidAggregateFilterTransposeRule may cause very fine-grained aggregations to be pushed to Druid.

      Running DruidAdapterIT.testFilterTimestamp, here is the previous plan (with DruidAggregateFilterTransposeRule disabled):

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[COUNT()])
          BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
      

      Here is the (in my opinion inferior) plan with DruidAggregateFilterTransposeRule enabled:

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[$SUM0($1)])
          BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 6)))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], aggs=[[COUNT()]])
      

      Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp is very high cardinality, is this an efficient operation for Druid?

      For this particular query, the ideal would be to push the filter into the intervals clause. Then we would not need to group by __timestamp. I am not sure why this is not happening.

      Nishant Bangarwa, slim bouguerra, How bad is the query with DruidAggregateFilterTransposeRule enabled, in your opinion? Is this a show-stopper for Calcite 1.12?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                julianhyde Julian Hyde
                Reporter:
                julianhyde Julian Hyde
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: