[CALCITE-1591] Druid adapter: Use "groupBy" query with extractionFn for time dimension - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: druid-adapter
Labels:
None

Description

For queries that aggregate on the time dimension, or a function of it such as FLOOR(__time TO DAY), as of the fix for ~~CALCITE-1579~~ we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.

In ~~CALCITE-1578~~, nishantbangarwa gives the following example query:

{
  "queryType": "groupBy",
  "dataSource": "druid_tpcds_ss_sold_time_subset",
  "granularity": "ALL",
  "dimensions": [
    "i_brand_id",
    {
      "type" : "extraction",
      "dimension" : "__time",
      "outputName" :  "year",
      "extractionFn" : {
        "type" : "timeFormat",
        "granularity" : "YEAR"
      }
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 10,
    "columns": [
      {
        "dimension": "$f3",
        "direction": "ascending"
      }
    ]
  },
  "aggregations": [
    {
      "type": "longMax",
      "name": "$f2",
      "fieldName": "ss_quantity"
    },
    {
      "type": "doubleSum",
      "name": "$f3",
      "fieldName": "ss_wholesale_cost"
    }
  ],
  "intervals": [
    "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
  ]
}

and for DruidAdapterIt. testGroupByDaySortDescLimit, bslim suggests

{
  "queryType": "groupBy",
  "dataSource": "foodmart",
  "granularity": "all",
  "dimensions": [
    "brand_name",
    {
      "type": "extraction",
      "dimension": "__time",
      "outputName": "day",
      "extractionFn": {
        "type": "timeFormat",
        "granularity": "DAY"
      }
    }
  ],
  "aggregations": [
    {
      "type": "longSum",
      "name": "S",
      "fieldName": "unit_sales"
    }
  ],
  "limitSpec": {
    "type": "default",
    "limit": 30,
    "columns": [
      {
        "dimension": "S",
        "direction": "ascending"
      }
    ]
  }
}

Attachments

Issue Links

is related to

CALCITE-1578 Druid adapter: wrong semantics of topN query limit with granularity

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Julian Hyde

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Jan/17 22:24

Updated:: 22/Feb/19 01:22