Description
I am working on the integration of materialized view rewriting within Hive.
Once a view matches an operator plan, rewriting is split vastly in two steps. The first step will verify that the input to the root operator of the matched plan is equivalent or contained within the input to the root operator of the query representing the view. The second step will trigger a unify rule, which tries to rewrite the matched operator tree into a scan on the view and possibly some additional operators to compute the exact results needed by the query (think about Project that alters the column order, additional Filter on the view, additional Join operation, etc.)
If we focus on step 1, checking equivalence/containment, I would like to extend the metadata providers in Calcite to give us more information about the matched (sub)plan. In particular, I am thinking on:
- Expression column origin. Currently Calcite can provide the column origins for a certain column and whether it is derived or not. However, we would need to obtain the expression that generated a certain column. This expression should contain references to the input tables. For instance, given expression column c, the new md provider would return that it was generated by expression A.a + B.b.
- All predicates. Currently Calcite can extract predicates that have been applied on an RelNode output (we can think on them as constraints on the output). However, I would like to extract all predicates that have been applied on a given RelNode (sub)plan. Since nodes might not be part of the output, expressions should contain references to the input tables. For instance, the new md provider might return the expressions A.a + B.b > C.c AND D.d = 100.
- PK-FK relationship. I do not plan to implement this one immediately. However, exposing this information (given it is provided) can help us to trigger more rewriting containing join operators. Thus, I was wondering if it is worth adding it.
Once this information is available, we can rely on it to implement logic similar to [1] to check whether a given (sub)plan is equivalent/contained within a given view.
One question I have is about representing the table columns as a RexNode, as I think it is the easiest way to be returned by the new metadata providers. I checked RexPatternFieldRef and I think it will meet our requirements: alpha would be the qualified table name, while the index is the column idx for the table. Thoughts?
I have started working on this and will provide a patch shortly; feedback is greatly appreciated.
[1] ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
Attachments
Issue Links
- is depended upon by
-
CALCITE-1731 Rewriting of queries using materialized views with joins and aggregates
- Closed
- is related to
-
CALCITE-2189 RelMdAllPredicates fast bail out creates mismatch with RelMdTableReferences
- Closed