Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-786

Detect if materialized view can be used to rewrite a query in non-trivial cases

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4.0-incubating, 1.3.0-incubating
    • 1.4.0-incubating
    • core
    • None

    Description

      Improvement to detection if MV can be used to rewrite queries in non-trivial cases.

      Pasting the email conversation below that happened over this which briefly discusses the approach taken:

      ---------- Forwarded message ----------
      From: Amogh Margoor <amoghm@qubole.com>
      Date: Fri, Jun 26, 2015 at 10:11 AM
      Subject: Re: Detect if materialized view can be used to rewrite a query in non-trivial cases
      To: dev@calcite.incubator.apache.org, Rajat Venkatesh <rvenkatesh@qubole.com>

      Hi Julian,

      Thanks a lot Julian for your feedback. I have inlined my response below which also includes the commit done.

      Regards,
      Amogh

      On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <jhyde@apache.org> wrote:

      This is great work. Certainly consistent with where I am heading.

      I would not be inclined to use DNF (because of its tendency to inflate
      certain predicates) but if you are able to get something effective I
      will happily use it. I think you should package it behind a method –
      "find out what is left to satisfy p when you have already satisfied q"
      or something – and write lots of tests of that method, and it doesn't
      really matter what algorithm is behind it.

      Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
      on finding whether

      p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)

      is satisfiable.

      >> I saw this method. I will try to use this in improvements to follow.
      >> It didnot seem to solve this currently: (x>10 => x>30) i.e., find if
      >> NOT (NOT(x>10 ) OR x >30) is satisfiable.
      >> We have currently packaged it as "if X => Y" (see RexImplicationChecker
      >> in the commit I shared below), but agree it should be
      >> more generic like what you suggested above and something we can try to achieve.

      Later we will want to know not just "can I satisfy query Q using
      materialization M?" but "can I satisfy part of Q using M, and what is
      left over?". I can convert most of Q to use an aggregate table over
      years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
      June 1st onwards, that is a big win.

      >> This certainly should be something we should aim at.

      What branch are you working on? Your master branch
      https://github.com/qubole/incubator-calcite/tree/master seems to be
      the same as apache/master right now.

      >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 .
      >> This is the commit: https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified
      >> We are in the process of writing UTs for it. We did most of the testing through our client code till now.
      >> We have created new Visitor extending SustitutionVisitor because did not want to mess with the existing code.
      >> More rules need to be added to the new Visitor.
      >> Will raise a PR once UTs are added and testing is complete.

      If you can divide this work into pull requests with unit tests, I will
      happy commit each change as you make progress.

      By the way, I logged a few jira cases connected to materialized view
      rewrite today. They were motivated by the phoenix team wanting to use
      secondary indexes. But they could by applied to any scan-project-sort
      materialization. See

      >> Thanks for sharing this info Julian. Will definitely take a look.

      Julian

      On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <amoghm@qubole.com> wrote:
      > Hi,
      > We were working on a problem to detect if materialized view can be used to
      > rewrite a query in non-trivial cases. Will briefly describe the problem and
      > approach below and would appreciate feedback on the same.
      >
      > Motivation
      > ---------------
      > For instance there exists a table "logs" and a partition (materialized
      > view) named "log_after_01_Jan" created on it and described by SQL :
      > "Select * from logs where log_date > '01-01-2015' ".
      >
      > Assume that the table "log_after_01_Jan" is much smaller than table "logs".
      >
      > For user query:
      > "Select log_body from logs where log_date > '03-03-2015' and
      > char_length(log_body) < 100",
      > we should detect that the materialized view "log_after_01_Jan" can be used
      > and transform the query into:
      >
      > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
      > char_length(log_body) < 100"
      >
      > Approach
      > --------------
      > One of the fundamental problems we would come across here is to check if a
      > boolean condition X implies (=>) Y. This quickly reduces to the
      > Satisfiability problem which is NP complete for propositional logic. But
      > there are many instances like above which can be detected easily. We have
      > implemented an approach to handle several useful cases for few operators
      > and types of operands. Will be extending it further for more types of
      > operations.
      >
      > Top Level approach:
      >
      > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite original
      > query using MV using SubstitutionVisitor. Have extended SubstitutionVisitor
      > to detect above cases and do the substitution.
      >
      > 2. To check if a condition X => Y,
      > a. Convert both of them into Disjunctive Normal Form.
      > Say X is transformed into x_1 or x_2 or x_3 ... or x_m and
      > Y is transformed into y_1 or y_2 ,... or y_i, where any x_i and y_i
      > are conjunctions of atomic predicates.
      > For instance condition "(a>10 or b>20) and c <90" will be converted
      > to DNF: (a>10 and c<90) or (b>20 and c<90).
      >
      > b. For X=>Y to be a tautology i.e., hold always true, every conjunction
      > x_i should imply atleast one of the conjunction y_j.
      > We wrote some set of simple heuristics to check if a conjunction of
      > atomic predicates implies another.
      > This also involves executing RexNode using RexImplExecutor.
      >
      > We have checked in code for this in our fork of
      > calcite(qubole/incubator-calcite). This is ongoing work and we will be
      > making many more improvements to it. If this is useful or anybody is
      > interested in giving feedback then I can share the commit so that we can
      > discuss about it and take it forward.
      >
      > Regards,
      > Amogh
      > Member of Technical Staff
      > Qubole

      Attachments

        Activity

          People

            julianhyde Julian Hyde
            amargoor Amogh Margoor
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: