Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1589

Druid adapter: timeseries query shows all days, even if no data

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: druid
    • Labels:
      None

      Description

      Following query is transformed into timeseries Druid query which yields different results in Calcite vs Druid, since it will show all values for the given time granularity, even if there is no data for the given i_brand_id.

      SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
      FROM store_sales_sold_time_subset
      WHERE i_brand_id = 10001009
      GROUP BY floor_day(`__time`)
      ORDER BY `granularity`;
      OK
      1999-11-01 00:00:00	45	37.47
      1999-11-02 00:00:00	-9223372036854775808	0.0
      1999-11-03 00:00:00	-9223372036854775808	0.0
      1999-11-04 00:00:00	39	61.52
      1999-11-05 00:00:00	74	145.84
      1999-11-06 00:00:00	62	14.5
      1999-11-07 00:00:00	-9223372036854775808	0.0
      1999-11-08 00:00:00	5	34.08
      1999-11-09 00:00:00	-9223372036854775808	0.0
      1999-11-10 00:00:00	-9223372036854775808	0.0
      1999-11-11 00:00:00	-9223372036854775808	0.0
      1999-11-12 00:00:00	66	67.22
      1999-11-13 00:00:00	-9223372036854775808	0.0
      1999-11-14 00:00:00	-9223372036854775808	0.0
      1999-11-15 00:00:00	-9223372036854775808	0.0
      1999-11-16 00:00:00	60	96.37
      1999-11-17 00:00:00	50	79.11
      1999-11-18 00:00:00	-9223372036854775808	0.0
      1999-11-19 00:00:00	-9223372036854775808	0.0
      1999-11-20 00:00:00	-9223372036854775808	0.0
      1999-11-21 00:00:00	-9223372036854775808	0.0
      1999-11-22 00:00:00	-9223372036854775808	0.0
      1999-11-23 00:00:00	57	17.69
      1999-11-24 00:00:00	-9223372036854775808	0.0
      1999-11-25 00:00:00	-9223372036854775808	0.0
      1999-11-26 00:00:00	-9223372036854775808	0.0
      1999-11-27 00:00:00	86	91.59
      1999-11-28 00:00:00	-9223372036854775808	0.0
      1999-11-29 00:00:00	93	136.48
      1999-11-30 00:00:00	-9223372036854775808	0.0
      SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
      FROM store_sales_sold_time_subset_calcite
      WHERE i_brand_id = 10001009
      GROUP BY floor_day(`__time`)
      ORDER BY `granularity`;
      OK
      1999-11-01 00:00:00	45	37.47
      1999-11-04 00:00:00	39	61.52
      1999-11-05 00:00:00	74	145.84
      1999-11-06 00:00:00	62	14.5
      1999-11-08 00:00:00	5	34.08
      1999-11-12 00:00:00	66	67.22
      1999-11-16 00:00:00	60	96.36999999999999
      1999-11-17 00:00:00	50	79.11
      1999-11-23 00:00:00	57	17.689999999999998
      1999-11-27 00:00:00	86	91.59
      1999-11-29 00:00:00	93	136.48
      

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.12.0 (2017-03-24).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Thanks for taking a look Julian Hyde. Added new test case that covers this issue.

          Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/0187cfc.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Thanks for taking a look Julian Hyde . Added new test case that covers this issue. Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/0187cfc .
          Hide
          julianhyde Julian Hyde added a comment -

          Looks good, but can you add a test case to DruidAdapterIT that returns the wrong results if this bug is not fixed? I'm guessing that a very restrictive WHERE clause would do it.

          Show
          julianhyde Julian Hyde added a comment - Looks good, but can you add a test case to DruidAdapterIT that returns the wrong results if this bug is not fixed? I'm guessing that a very restrictive WHERE clause would do it.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, I have created a PR in https://github.com/apache/calcite/pull/358 . Fix is straightforward. Could you take a look? Thanks

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have created a PR in https://github.com/apache/calcite/pull/358 . Fix is straightforward. Could you take a look? Thanks
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Gian Merlino, in any case thanks for all the help disambiguating Druid query semantics!

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Gian Merlino , in any case thanks for all the help disambiguating Druid query semantics!
          Hide
          gian Gian Merlino added a comment -

          Ah, nevermind, I see Nishant already told you that

          Show
          gian Gian Merlino added a comment - Ah, nevermind, I see Nishant already told you that
          Hide
          gian Gian Merlino added a comment -

          You want "skipEmptyBuckets" : true in your query context.

          Show
          gian Gian Merlino added a comment - You want "skipEmptyBuckets" : true in your query context.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Nishant Bangarwa suggested the following:

          For timeseries druid fills the empty buckets using the default values for the aggregators.
          To change this behavior we can set a flag in query context to skipEmptyBuckets.
          Below Druid query should return expected results -

           
          cat bug-72369.json
          {
            "queryType":"timeseries",
            "dataSource": "druid_tpcds_ss_sold_time_subset",
            "descending": false,
            "granularity": "DAY",
            "filter": {
              "type": "selector",
              "dimension": "i_brand_id",
              "value": "10001009"
            },
            "aggregations": [
              {
                "type": "longMax",
                "name": "$f1",
                "fieldName": "ss_quantity"
              },
              {
                "type": "doubleSum",
                "name": "$f2",
                "fieldName": "ss_wholesale_cost"
              },
              {
                "type" : "count",
                "name" : "count"
              }
            ],
            "intervals": [
              "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
            ],
            "context" : {
               "skipEmptyBuckets" : true
            }
          }
          
          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Nishant Bangarwa suggested the following: For timeseries druid fills the empty buckets using the default values for the aggregators. To change this behavior we can set a flag in query context to skipEmptyBuckets. Below Druid query should return expected results - cat bug-72369.json { "queryType" : "timeseries" , "dataSource" : "druid_tpcds_ss_sold_time_subset" , "descending" : false , "granularity" : "DAY" , "filter" : { "type" : "selector" , "dimension" : "i_brand_id" , "value" : "10001009" }, "aggregations" : [ { "type" : "longMax" , "name" : "$f1" , "fieldName" : "ss_quantity" }, { "type" : "doubleSum" , "name" : "$f2" , "fieldName" : "ss_wholesale_cost" }, { "type" : "count" , "name" : "count" } ], "intervals" : [ "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z" ], "context" : { "skipEmptyBuckets" : true } }

            People

            • Assignee:
              jcamachorodriguez Jesus Camacho Rodriguez
              Reporter:
              jcamachorodriguez Jesus Camacho Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development