Description
We would like to convert predicates on date dimension columns into date ranges. This is particularly useful for Druid, which has a single timestamp column.
Consider the case of a materialized view
SELECT sales.*, product.*, time_by_day.* FROM sales JOIN product USING (product_id) JOIN time_by_day USING (time_id)
that corresponds to a Druid table
sales_product_time( product_id int not null, time_id int not null, units int not null, the_year int not null, the_quarter int not null, the_month int not null, the_timestamp timestamp not null, product_name varchar(20) not null)
And suppose we have the following check constraints:
- CHECK the_year = EXTRACT(YEAR FROM the_timestamp)
- CHECK the_month = EXTRACT(MONTH FROM the_timestamp)
Given a query
SELECT product_id, count(*) FROM sales JOIN product USING (product_id) JOIN time_by_day USING (time_id) WHERE the_year = 2016 AND the_month IN (4, 5, 6)
we would like to transform it into the following query to be run against Druid:
SELECT product_id, count(*) FROM sales_product_time WHERE the_timestamp BETWEEN '2016-04-01' AND '2016-06-30'
Druid can handle timestamp ranges (or disjoint sets of ranges) very efficiently.
I believe we can write a rule that knows the check constraints and also knows the properties of the EXTRACT function:
1. Apply check constraints to convert WHERE year = ... to WHERE EXTRACT(YEAR FROM the_timestamp) = ..., etc.
2. EXTRACT(YEAR FROM ...) is monotonic, therefore we can deduce the range of the_timestamp values such that EXTRACT(YEAR FROM the_timestamp) returns 2016.
3. Then we need to use the fact that EXTRACT(MONTH FROM the_timestamp) is monotonic if the_timestamp is bounded within a particular year.
4. And we need to merge month ranges somehow.
Attachments
Issue Links
- blocks
-
CALCITE-1356 Release Calcite 1.9.0
- Closed
- is related to
-
HIVE-14466 Extend Calcite capabilities to transform plan into Druid query
- Resolved
- relates to
-
CALCITE-1358 Push filters on time dimension to Druid
- Closed
- links to