Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1591

Druid adapter: Use "groupBy" query with extractionFn for time dimension

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • druid-adapter
    • None

    Description

      For queries that aggregate on the time dimension, or a function of it such as FLOOR(__time TO DAY), as of the fix for CALCITE-1579 we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.

      In CALCITE-1578, nishantbangarwa gives the following example query:

      {
        "queryType": "groupBy",
        "dataSource": "druid_tpcds_ss_sold_time_subset",
        "granularity": "ALL",
        "dimensions": [
          "i_brand_id",
          {
            "type" : "extraction",
            "dimension" : "__time",
            "outputName" :  "year",
            "extractionFn" : {
              "type" : "timeFormat",
              "granularity" : "YEAR"
            }
          }
        ],
        "limitSpec": {
          "type": "default",
          "limit": 10,
          "columns": [
            {
              "dimension": "$f3",
              "direction": "ascending"
            }
          ]
        },
        "aggregations": [
          {
            "type": "longMax",
            "name": "$f2",
            "fieldName": "ss_quantity"
          },
          {
            "type": "doubleSum",
            "name": "$f3",
            "fieldName": "ss_wholesale_cost"
          }
        ],
        "intervals": [
          "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
        ]
      }
      

      and for DruidAdapterIt. testGroupByDaySortDescLimit, bslim suggests

      {
        "queryType": "groupBy",
        "dataSource": "foodmart",
        "granularity": "all",
        "dimensions": [
          "brand_name",
          {
            "type": "extraction",
            "dimension": "__time",
            "outputName": "day",
            "extractionFn": {
              "type": "timeFormat",
              "granularity": "DAY"
            }
          }
        ],
        "aggregations": [
          {
            "type": "longSum",
            "name": "S",
            "fieldName": "unit_sales"
          }
        ],
        "limitSpec": {
          "type": "default",
          "limit": 30,
          "columns": [
            {
              "dimension": "S",
              "direction": "ascending"
            }
          ]
        }
      }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              julianhyde Julian Hyde
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: