Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1706

DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • None
    • None

    Description

      Enabling DruidAggregateFilterTransposeRule may cause very fine-grained aggregations to be pushed to Druid.

      Running DruidAdapterIT.testFilterTimestamp, here is the previous plan (with DruidAggregateFilterTransposeRule disabled):

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[COUNT()])
          BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
      

      Here is the (in my opinion inferior) plan with DruidAggregateFilterTransposeRule enabled:

      EnumerableInterpreter
        BindableAggregate(group=[{}], C=[$SUM0($1)])
          BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 6)))])
            DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], aggs=[[COUNT()]])
      

      Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp is very high cardinality, is this an efficient operation for Druid?

      For this particular query, the ideal would be to push the filter into the intervals clause. Then we would not need to group by __timestamp. I am not sure why this is not happening.

      Nishant Bangarwa, Slim Bouguerra, How bad is the query with DruidAggregateFilterTransposeRule enabled, in your opinion? Is this a show-stopper for Calcite 1.12?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            julianhyde Julian Hyde
            julianhyde Julian Hyde
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment