Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1591

Druid adapter: Use "groupBy" query with extractionFn for time dimension

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: druid-adapter
    • Labels:
      None

      Description

      For queries that aggregate on the time dimension, or a function of it such as FLOOR(__time TO DAY), as of the fix for CALCITE-1579 we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.

      In CALCITE-1578, Nishant Bangarwa gives the following example query:

      {
        "queryType": "groupBy",
        "dataSource": "druid_tpcds_ss_sold_time_subset",
        "granularity": "ALL",
        "dimensions": [
          "i_brand_id",
          {
            "type" : "extraction",
            "dimension" : "__time",
            "outputName" :  "year",
            "extractionFn" : {
              "type" : "timeFormat",
              "granularity" : "YEAR"
            }
          }
        ],
        "limitSpec": {
          "type": "default",
          "limit": 10,
          "columns": [
            {
              "dimension": "$f3",
              "direction": "ascending"
            }
          ]
        },
        "aggregations": [
          {
            "type": "longMax",
            "name": "$f2",
            "fieldName": "ss_quantity"
          },
          {
            "type": "doubleSum",
            "name": "$f3",
            "fieldName": "ss_wholesale_cost"
          }
        ],
        "intervals": [
          "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
        ]
      }
      

      and for DruidAdapterIt. testGroupByDaySortDescLimit, Slim Bouguerra suggests

      {
        "queryType": "groupBy",
        "dataSource": "foodmart",
        "granularity": "all",
        "dimensions": [
          "brand_name",
          {
            "type": "extraction",
            "dimension": "__time",
            "outputName": "day",
            "extractionFn": {
              "type": "timeFormat",
              "granularity": "DAY"
            }
          }
        ],
        "aggregations": [
          {
            "type": "longSum",
            "name": "S",
            "fieldName": "unit_sales"
          }
        ],
        "limitSpec": {
          "type": "default",
          "limit": 30,
          "columns": [
            {
              "dimension": "S",
              "direction": "ascending"
            }
          ]
        }
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                julianhyde Julian Hyde
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: