Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
For queries that aggregate on the time dimension, or a function of it such as FLOOR(__time TO DAY), as of the fix for CALCITE-1579 we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.
In CALCITE-1578, nishantbangarwa gives the following example query:
{ "queryType": "groupBy", "dataSource": "druid_tpcds_ss_sold_time_subset", "granularity": "ALL", "dimensions": [ "i_brand_id", { "type" : "extraction", "dimension" : "__time", "outputName" : "year", "extractionFn" : { "type" : "timeFormat", "granularity" : "YEAR" } } ], "limitSpec": { "type": "default", "limit": 10, "columns": [ { "dimension": "$f3", "direction": "ascending" } ] }, "aggregations": [ { "type": "longMax", "name": "$f2", "fieldName": "ss_quantity" }, { "type": "doubleSum", "name": "$f3", "fieldName": "ss_wholesale_cost" } ], "intervals": [ "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z" ] }
and for DruidAdapterIt. testGroupByDaySortDescLimit, bslim suggests
{ "queryType": "groupBy", "dataSource": "foodmart", "granularity": "all", "dimensions": [ "brand_name", { "type": "extraction", "dimension": "__time", "outputName": "day", "extractionFn": { "type": "timeFormat", "granularity": "DAY" } } ], "aggregations": [ { "type": "longSum", "name": "S", "fieldName": "unit_sales" } ], "limitSpec": { "type": "default", "limit": 30, "columns": [ { "dimension": "S", "direction": "ascending" } ] } }
Attachments
Issue Links
- is related to
-
CALCITE-1578 Druid adapter: wrong semantics of topN query limit with granularity
- Closed