Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15635

Hive/Druid integration: timeseries query shows all days, even if no data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 3.0.0
    • Component/s: Druid integration
    • Labels:
      None
    • Target Version/s:

      Description

      We should have consistent results on Druid vs Hive. However, following query is transformed into timeseries Druid query which yields different results in Druid, since it will show all values for the given time granularity, even if there is no data for the given i_brand_id.

      In Druid:

      SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
      FROM store_sales_sold_time_subset
      WHERE i_brand_id = 10001009
      GROUP BY floor_day(`__time`)
      ORDER BY `granularity`;
      OK
      1999-11-01 00:00:00	45	37.47
      1999-11-02 00:00:00	-9223372036854775808	0.0
      1999-11-03 00:00:00	-9223372036854775808	0.0
      1999-11-04 00:00:00	39	61.52
      1999-11-05 00:00:00	74	145.84
      1999-11-06 00:00:00	62	14.5
      1999-11-07 00:00:00	-9223372036854775808	0.0
      1999-11-08 00:00:00	5	34.08
      1999-11-09 00:00:00	-9223372036854775808	0.0
      1999-11-10 00:00:00	-9223372036854775808	0.0
      1999-11-11 00:00:00	-9223372036854775808	0.0
      1999-11-12 00:00:00	66	67.22
      1999-11-13 00:00:00	-9223372036854775808	0.0
      1999-11-14 00:00:00	-9223372036854775808	0.0
      1999-11-15 00:00:00	-9223372036854775808	0.0
      1999-11-16 00:00:00	60	96.37
      1999-11-17 00:00:00	50	79.11
      1999-11-18 00:00:00	-9223372036854775808	0.0
      1999-11-19 00:00:00	-9223372036854775808	0.0
      1999-11-20 00:00:00	-9223372036854775808	0.0
      1999-11-21 00:00:00	-9223372036854775808	0.0
      1999-11-22 00:00:00	-9223372036854775808	0.0
      1999-11-23 00:00:00	57	17.69
      1999-11-24 00:00:00	-9223372036854775808	0.0
      1999-11-25 00:00:00	-9223372036854775808	0.0
      1999-11-26 00:00:00	-9223372036854775808	0.0
      1999-11-27 00:00:00	86	91.59
      1999-11-28 00:00:00	-9223372036854775808	0.0
      1999-11-29 00:00:00	93	136.48
      1999-11-30 00:00:00	-9223372036854775808	0.0
      

      In Hive:

      SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
      FROM store_sales_sold_time_subset_hive
      WHERE i_brand_id = 10001009
      GROUP BY floor_day(`__time`)
      ORDER BY `granularity`;
      OK
      1999-11-01 00:00:00	45	37.47
      1999-11-04 00:00:00	39	61.52
      1999-11-05 00:00:00	74	145.84
      1999-11-06 00:00:00	62	14.5
      1999-11-08 00:00:00	5	34.08
      1999-11-12 00:00:00	66	67.22
      1999-11-16 00:00:00	60	96.36999999999999
      1999-11-17 00:00:00	50	79.11
      1999-11-23 00:00:00	57	17.689999999999998
      1999-11-27 00:00:00	86	91.59
      1999-11-29 00:00:00	93	136.48
      

      Probably we should handle this in the timeseries record reader.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jcamachorodriguez Jesus Camacho Rodriguez
                Reporter:
                jcamachorodriguez Jesus Camacho Rodriguez
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: