Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1334

Convert predicates on EXTRACT function calls into date ranges

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0
    • Component/s: None
    • Labels:

      Description

      We would like to convert predicates on date dimension columns into date ranges. This is particularly useful for Druid, which has a single timestamp column.

      Consider the case of a materialized view

      SELECT sales.*, product.*, time_by_day.*
      FROM sales
      JOIN product USING (product_id)
      JOIN time_by_day USING (time_id)
      

      that corresponds to a Druid table

      sales_product_time(
        product_id int not null,
        time_id int not null,
        units int not null,
        the_year int not null,
        the_quarter int not null,
        the_month int not null,
        the_timestamp timestamp not null,
        product_name varchar(20) not null)
      

      And suppose we have the following check constraints:

      • CHECK the_year = EXTRACT(YEAR FROM the_timestamp)
      • CHECK the_month = EXTRACT(MONTH FROM the_timestamp)

      Given a query

      SELECT product_id, count(*)
      FROM sales
      JOIN product USING (product_id)
      JOIN time_by_day USING (time_id)
      WHERE the_year = 2016
      AND the_month IN (4, 5, 6)
      

      we would like to transform it into the following query to be run against Druid:

      SELECT product_id, count(*)
      FROM sales_product_time
      WHERE the_timestamp BETWEEN '2016-04-01' AND '2016-06-30'
      

      Druid can handle timestamp ranges (or disjoint sets of ranges) very efficiently.

      I believe we can write a rule that knows the check constraints and also knows the properties of the EXTRACT function:

      1. Apply check constraints to convert WHERE year = ... to WHERE EXTRACT(YEAR FROM the_timestamp) = ..., etc.
      2. EXTRACT(YEAR FROM ...) is monotonic, therefore we can deduce the range of the_timestamp values such that EXTRACT(YEAR FROM the_timestamp) returns 2016.
      3. Then we need to use the fact that EXTRACT(MONTH FROM the_timestamp) is monotonic if the_timestamp is bounded within a particular year.
      4. And we need to merge month ranges somehow.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                julianhyde Julian Hyde
                Reporter:
                julianhyde Julian Hyde
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: