Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1334

Convert predicates on EXTRACT function calls into date ranges

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0
    • Component/s: None
    • Labels:

      Description

      We would like to convert predicates on date dimension columns into date ranges. This is particularly useful for Druid, which has a single timestamp column.

      Consider the case of a materialized view

      SELECT sales.*, product.*, time_by_day.*
      FROM sales
      JOIN product USING (product_id)
      JOIN time_by_day USING (time_id)
      

      that corresponds to a Druid table

      sales_product_time(
        product_id int not null,
        time_id int not null,
        units int not null,
        the_year int not null,
        the_quarter int not null,
        the_month int not null,
        the_timestamp timestamp not null,
        product_name varchar(20) not null)
      

      And suppose we have the following check constraints:

      • CHECK the_year = EXTRACT(YEAR FROM the_timestamp)
      • CHECK the_month = EXTRACT(MONTH FROM the_timestamp)

      Given a query

      SELECT product_id, count(*)
      FROM sales
      JOIN product USING (product_id)
      JOIN time_by_day USING (time_id)
      WHERE the_year = 2016
      AND the_month IN (4, 5, 6)
      

      we would like to transform it into the following query to be run against Druid:

      SELECT product_id, count(*)
      FROM sales_product_time
      WHERE the_timestamp BETWEEN '2016-04-01' AND '2016-06-30'
      

      Druid can handle timestamp ranges (or disjoint sets of ranges) very efficiently.

      I believe we can write a rule that knows the check constraints and also knows the properties of the EXTRACT function:

      1. Apply check constraints to convert WHERE year = ... to WHERE EXTRACT(YEAR FROM the_timestamp) = ..., etc.
      2. EXTRACT(YEAR FROM ...) is monotonic, therefore we can deduce the range of the_timestamp values such that EXTRACT(YEAR FROM the_timestamp) returns 2016.
      3. Then we need to use the fact that EXTRACT(MONTH FROM the_timestamp) is monotonic if the_timestamp is bounded within a particular year.
      4. And we need to merge month ranges somehow.

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Jesus Camacho Rodriguez, Do you think this is the right approach? Will it be re-usable in Hive? (I don't know whether Hive has EXTRACT, for instance.)

          Show
          julianhyde Julian Hyde added a comment - Jesus Camacho Rodriguez , Do you think this is the right approach? Will it be re-usable in Hive? (I don't know whether Hive has EXTRACT, for instance.)
          Hide
          julianhyde Julian Hyde added a comment -
          Show
          julianhyde Julian Hyde added a comment - I have created https://github.com/apache/calcite/pull/277 . Jesus Camacho Rodriguez , can you please review?
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/87435a9b .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

          Fixed up in http://git-wip-us.apache.org/repos/asf/calcite/commit/63a7a1a .

          Changed Calendar.getInstance();, which is user default timezone dependent, by Calendar.getInstance(TimeZone.getTimeZone("UTC"));.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Fixed up in http://git-wip-us.apache.org/repos/asf/calcite/commit/63a7a1a . Changed Calendar.getInstance(); , which is user default timezone dependent, by Calendar.getInstance(TimeZone.getTimeZone("UTC")); .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.9.0 (2016-09-22)

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.9.0 (2016-09-22)
          Hide
          nishantbangarwa Nishant Bangarwa added a comment - - edited

          looks like this is still an issue,
          With the latest trunk the query being sent to Druid for DruidAdapterIT.testFilterTimestamp is as follows
          Notice the interval being sent to Druid I expect that to be "1997-04-01/1997-06-30" -

          {
            "queryType": "select",
            "dataSource": "foodmart",
            "descending": false,
            "intervals": [
              "1900-01-09T00:00:00.000/2992-01-10T00:00:00.000"
            ],
            "dimensions": [
              "product_id",
              "brand_name",
              "product_name",
              "SKU",
              "SRP",
              "gross_weight",
              "net_weight",
              "recyclable_package",
              "low_fat",
              "units_per_case",
              "cases_per_pallet",
              "shelf_width",
              "shelf_height",
              "shelf_depth",
              "product_class_id",
              "product_subcategory",
              "product_category",
              "product_department",
              "product_family",
              "customer_id",
              "account_num",
              "lname",
              "fname",
              "mi",
              "address1",
              "address2",
              "address3",
              "address4",
              "city",
              "state_province",
              "postal_code",
              "country",
              "customer_region_id",
              "phone1",
              "phone2",
              "birthdate",
              "marital_status",
              "yearly_income",
              "gender",
              "total_children",
              "num_children_at_home",
              "education",
              "date_accnt_opened",
              "member_card",
              "occupation",
              "houseowner",
              "num_cars_owned",
              "fullname",
              "promotion_id",
              "promotion_district_id",
              "promotion_name",
              "media_type",
              "cost",
              "start_date",
              "end_date",
              "store_id",
              "store_type",
              "region_id",
              "store_name",
              "store_number",
              "store_street_address",
              "store_city",
              "store_state",
              "store_postal_code",
              "store_country",
              "store_manager",
              "store_phone",
              "store_fax",
              "first_opened_date",
              "last_remodel_date",
              "store_sqft",
              "grocery_sqft",
              "frozen_sqft",
              "meat_sqft",
              "coffee_bar",
              "video_store",
              "salad_bar",
              "prepared_food",
              "florist",
              "time_id",
              "the_day",
              "the_month",
              "the_year",
              "day_of_month",
              "week_of_year",
              "month_of_year",
              "quarter",
              "fiscal_period"
            ],
            "metrics": [
              "unit_sales",
              "store_sales",
              "store_cost"
            ],
            "granularity": "all",
            "pagingSpec": {
              "threshold": 16384,
              "fromNext": true
            },
            "context": {
              "druid.query.fetch": false
            }
          }
          
          Show
          nishantbangarwa Nishant Bangarwa added a comment - - edited looks like this is still an issue, With the latest trunk the query being sent to Druid for DruidAdapterIT.testFilterTimestamp is as follows Notice the interval being sent to Druid I expect that to be "1997-04-01/1997-06-30" - { "queryType" : "select" , "dataSource" : "foodmart" , "descending" : false , "intervals" : [ "1900-01-09T00:00:00.000/2992-01-10T00:00:00.000" ], "dimensions" : [ "product_id" , "brand_name" , "product_name" , "SKU" , "SRP" , "gross_weight" , "net_weight" , "recyclable_package" , "low_fat" , "units_per_case" , "cases_per_pallet" , "shelf_width" , "shelf_height" , "shelf_depth" , "product_class_id" , "product_subcategory" , "product_category" , "product_department" , "product_family" , "customer_id" , "account_num" , "lname" , "fname" , "mi" , "address1" , "address2" , "address3" , "address4" , "city" , "state_province" , "postal_code" , "country" , "customer_region_id" , "phone1" , "phone2" , "birthdate" , "marital_status" , "yearly_income" , "gender" , "total_children" , "num_children_at_home" , "education" , "date_accnt_opened" , "member_card" , "occupation" , "houseowner" , "num_cars_owned" , "fullname" , "promotion_id" , "promotion_district_id" , "promotion_name" , "media_type" , "cost" , "start_date" , "end_date" , "store_id" , "store_type" , "region_id" , "store_name" , "store_number" , "store_street_address" , "store_city" , "store_state" , "store_postal_code" , "store_country" , "store_manager" , "store_phone" , "store_fax" , "first_opened_date" , "last_remodel_date" , "store_sqft" , "grocery_sqft" , "frozen_sqft" , "meat_sqft" , "coffee_bar" , "video_store" , "salad_bar" , "prepared_food" , "florist" , "time_id" , "the_day" , "the_month" , "the_year" , "day_of_month" , "week_of_year" , "month_of_year" , "quarter" , "fiscal_period" ], "metrics" : [ "unit_sales" , "store_sales" , "store_cost" ], "granularity" : "all" , "pagingSpec" : { "threshold" : 16384, "fromNext" : true }, "context" : { "druid.query.fetch" : false } }
          Hide
          julianhyde Julian Hyde added a comment -

          Closing again. The issue, as stated, was fixed. Nishant Bangarwa, Can you log a new issue for pushing the interval set on the __timestamp column into the Druid query?

          Show
          julianhyde Julian Hyde added a comment - Closing again. The issue, as stated, was fixed. Nishant Bangarwa , Can you log a new issue for pushing the interval set on the __timestamp column into the Druid query?

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development