Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Enabling DruidAggregateFilterTransposeRule may cause very fine-grained aggregations to be pushed to Druid.
Running DruidAdapterIT.testFilterTimestamp, here is the previous plan (with DruidAggregateFilterTransposeRule disabled):
EnumerableInterpreter BindableAggregate(group=[{}], C=[COUNT()]) BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))]) DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
Here is the (in my opinion inferior) plan with DruidAggregateFilterTransposeRule enabled:
EnumerableInterpreter BindableAggregate(group=[{}], C=[$SUM0($1)]) BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 6)))]) DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], aggs=[[COUNT()]])
Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp is very high cardinality, is this an efficient operation for Druid?
For this particular query, the ideal would be to push the filter into the intervals clause. Then we would not need to group by __timestamp. I am not sure why this is not happening.
nishantbangarwa, bslim, How bad is the query with DruidAggregateFilterTransposeRule enabled, in your opinion? Is this a show-stopper for Calcite 1.12?
Attachments
Issue Links
- relates to
-
CALCITE-1436 AggregateNode NPE for aggregators other than SUM/COUNT
- Closed