[CALCITE-1682] New metadata providers for expression column origin and all predicates in plan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12.0
Fix Version/s: 1.13.0
Component/s: core
Labels:
None

Description

I am working on the integration of materialized view rewriting within Hive.

Once a view matches an operator plan, rewriting is split vastly in two steps. The first step will verify that the input to the root operator of the matched plan is equivalent or contained within the input to the root operator of the query representing the view. The second step will trigger a unify rule, which tries to rewrite the matched operator tree into a scan on the view and possibly some additional operators to compute the exact results needed by the query (think about Project that alters the column order, additional Filter on the view, additional Join operation, etc.)

If we focus on step 1, checking equivalence/containment, I would like to extend the metadata providers in Calcite to give us more information about the matched (sub)plan. In particular, I am thinking on:

Expression column origin. Currently Calcite can provide the column origins for a certain column and whether it is derived or not. However, we would need to obtain the expression that generated a certain column. This expression should contain references to the input tables. For instance, given expression column c, the new md provider would return that it was generated by expression A.a + B.b.
All predicates. Currently Calcite can extract predicates that have been applied on an RelNode output (we can think on them as constraints on the output). However, I would like to extract all predicates that have been applied on a given RelNode (sub)plan. Since nodes might not be part of the output, expressions should contain references to the input tables. For instance, the new md provider might return the expressions A.a + B.b > C.c AND D.d = 100.
PK-FK relationship. I do not plan to implement this one immediately. However, exposing this information (given it is provided) can help us to trigger more rewriting containing join operators. Thus, I was wondering if it is worth adding it.

Once this information is available, we can rely on it to implement logic similar to [1] to check whether a given (sub)plan is equivalent/contained within a given view.

One question I have is about representing the table columns as a RexNode, as I think it is the easiest way to be returned by the new metadata providers. I checked RexPatternFieldRef and I think it will meet our requirements: alpha would be the qualified table name, while the index is the column idx for the table. Thoughts?

I have started working on this and will provide a patch shortly; feedback is greatly appreciated.

[1] ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf

Attachments

Issue Links

is depended upon by

CALCITE-1731 Rewriting of queries using materialized views with joins and aggregates

Closed

is related to

CALCITE-2189 RelMdAllPredicates fast bail out creates mismatch with RelMdTableReferences

Closed

Activity

People

Assignee:: Jesús Camacho Rodríguez

Reporter:: Jesús Camacho Rodríguez

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Mar/17 11:34

Updated:: 27/Feb/24 22:23

Resolved:: 26/Apr/17 19:19