Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1731

Rewriting of queries using materialized views with joins and aggregates

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13.0
    • Component/s: core
    • Labels:
      None

      Description

      The idea is still to build a rewriting approach similar to:
      ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf

      I tried to build on CALCITE-1389 work. However, finally I ended up creating a new alternative rule. The main reason is that I wanted to follow the paper more closely and not rely on triggering rules within the MV rewriting to find whether expressions are equivalent. Instead, we extract information from the query plan and the MVs plans using the new metadata providers proposed in CALCITE-1682, and then we use that information to validate and execute the rewriting.

      I also implemented new unifying/rewriting logic within the rule, since existing unifying rules for aggregates were assuming that aggregate inputs in the query and the MV needed to be equivalent (same Volcano node). That condition can be relaxed because we verify in the rule, by using the new metadata providers as stated above, that the result for the query is contained within the MV.

      I added multiple tests, but any feedback pointing to new tests that could be added to check correctness/coverage is welcome.

      Algorithm can trigger multiple rewritings for the same query node. In addition, support for multiple usages of tables in query/MVs is supported.

      A few extensions that will follow this issue:

      • Extend logic to filter relevant MVs for a given query node, so approach is scalable as number of MVs grows.
      • Produce rewritings using Union operators, e.g., a given query could be partially answered from the MV (year = 2014) and from the query (not(year=2014)). If the MV is stored e.g. in Druid, this rewriting might be beneficial. As with the other rewritings, decision on whether to finally use the rewriting should be cost-based.

        Issue Links

          Activity

          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.13.0 (2017-06-26).

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.13.0 (2017-06-26).
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Fixed in three commits:

          Thanks for the feedback Julian Hyde!

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Fixed in three commits: Initial implementation: http://git-wip-us.apache.org/repos/asf/calcite/commit/84b49f5 . Support for referential constraints: http://git-wip-us.apache.org/repos/asf/calcite/commit/1f81e13 . Removal of previous join materialized view rewriting rule: http://git-wip-us.apache.org/repos/asf/calcite/commit/27ca310 . Thanks for the feedback Julian Hyde !
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Regarding your change, my only remarks are about cosmetics, I'm afraid: Can you change the /* comments to /** javadoc even for private methods? Javadoc comments allow references to classes, methods, fields, and the IDE and tools check these references, so the comments seem to stay more up to date. Also be sure to use <p> to separate paragraphs.

          +1

          Show
          julianhyde Julian Hyde added a comment - - edited Regarding your change, my only remarks are about cosmetics, I'm afraid: Can you change the /* comments to /** javadoc even for private methods? Javadoc comments allow references to classes, methods, fields, and the IDE and tools check these references, so the comments seem to stay more up to date. Also be sure to use <p> to separate paragraphs. +1
          Hide
          julianhyde Julian Hyde added a comment -

          RelOptReferentialConstraint goes with RelOptTable (i.e. allocated during query preparation) but for the purposes of CALCITE-1545, we would also need something that is part of the schema SPI, i.e. goes with org.apache.calcite.schema.Table and has a lifetime longer than a particular query. I would imagine a method Schema.getForeignKeys that is sufficient to power DatabaseMetaData.getImportedKeys (and getExportedKeys, which is very similar).

          Show
          julianhyde Julian Hyde added a comment - RelOptReferentialConstraint goes with RelOptTable (i.e. allocated during query preparation) but for the purposes of CALCITE-1545 , we would also need something that is part of the schema SPI, i.e. goes with org.apache.calcite.schema.Table and has a lifetime longer than a particular query. I would imagine a method Schema.getForeignKeys that is sufficient to power DatabaseMetaData.getImportedKeys (and getExportedKeys , which is very similar).
          Hide
          julianhyde Julian Hyde added a comment -

          Since there is a RelOptTable.getQualifiedName() method, an alternative with the same information content might be

          interface RelOptReferentialConstraint {
            /**The referencing table, e.g. DEPT. */
            RelOptTable getSource();
          
            /** The referenced table, e.g. EMP. */
            RelOptTable getTarget();
          
            /** The (source, target) column ordinals. */
            List<IntPair> getColumnPairs();
          }
          

          But it's a matter of taste.

          Show
          julianhyde Julian Hyde added a comment - Since there is a RelOptTable.getQualifiedName() method, an alternative with the same information content might be interface RelOptReferentialConstraint { /**The referencing table, e.g. DEPT. */ RelOptTable getSource(); /** The referenced table, e.g. EMP. */ RelOptTable getTarget(); /** The (source, target) column ordinals. */ List<IntPair> getColumnPairs(); } But it's a matter of taste.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, I have pushed a new commit in https://github.com/apache/calcite/pull/414/commits/7c6cf6f11c75d75157ad1c6cdeb13d26e3f25561 . I was wondering if you could take a look at the referential constraints (RelOptReferentialConstraint.java) in that commit and let me know whether you would add/modify/remove anything from their interface/implementation? That work should be used by other tasks to to represent database metadata at the planning phase (CALCITE-1545), thus I would like that it is stable. FYI, the current implementation covers our needs from the Hive side.

          Michael Mior, thanks for the response. I will remove it then when this work is merged.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have pushed a new commit in https://github.com/apache/calcite/pull/414/commits/7c6cf6f11c75d75157ad1c6cdeb13d26e3f25561 . I was wondering if you could take a look at the referential constraints ( RelOptReferentialConstraint.java ) in that commit and let me know whether you would add/modify/remove anything from their interface/implementation? That work should be used by other tasks to to represent database metadata at the planning phase ( CALCITE-1545 ), thus I would like that it is stable. FYI, the current implementation covers our needs from the Hive side. Michael Mior , thanks for the response. I will remove it then when this work is merged.
          Hide
          michaelmior Michael Mior added a comment -

          Sorry for the delayed response. Looks great! Yes, this does seem to obsolete the work I did in 1389. I'd suggest removing that rule if this is to be merged to avoid confusion.

          Show
          michaelmior Michael Mior added a comment - Sorry for the delayed response. Looks great! Yes, this does seem to obsolete the work I did in 1389. I'd suggest removing that rule if this is to be merged to avoid confusion.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, thanks for the feedback. I have updated the PR and I answer to your comments below:

          • I have added the missing comments, trying to elaborate more on the new metadata providers functioning.
          • The paper cited above refers to them as compensation predicates/expressions. If I remember correctly, I have seen them referred to as differential predicates/expressions too.
            In the method description, the relation between condition, target, and residue is defined as:
            condition = target AND residue
            In the example you mention, we have condition: x = 1, target: x = 1 OR z = 3. Then the residue was: NOT (z = 3), which is not correct.
            x = 1 <=> (x = 1 OR z = 3) AND NOT(z = 3)
            x = 1 <=> (x = 1 AND NOT(z = 3)) OR (z = 3 AND NOT(z = 3))
            x = 1 <=> (x = 1 AND NOT(z = 3)) (it does not hold)
            

            However, current result (residue: x = 1) is correct:

            x = 1 <=> (x = 1 OR z = 3) AND (x = 1) (it holds)
            
          • Replaced qualifiedName in RelTableRef by list of String; I do not know why I was handling qualifiedName as a single String in the first place.
          • Replaced RelTableRef.identifier by RelTableRef.entityNumber. Is it better?
          • I have added comments for RexInputTableRef and RelTableRef stating their purpose. I also nested RelTableRef within RexInputTableRef, probably a better way of encapsulating them both. RexInputTableRef is useful in the context of provenance/lineage, thus it will not be used through the whole planning phase. In fact, its usage is currently constrained to the new rewriting rule. Any new rewriting rule/auxiliary method will be able to make use of these new classes, but expressions generated by rewriting rules for the planner should not contain them anymore.
          • PR already contains new tests for SubstitutionVisitor.splitFilter which is the logic that was extended in SubstitutionVisitor (splitOr was also changed but it is called from splitFilter). Existing tests were also updated. In addition, new rewriting rule relies on splitFilter to generate compensation predicates, thus it is indirectly tested in MaterializationTest too.
          • I have extended materialized_views.md.
          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , thanks for the feedback. I have updated the PR and I answer to your comments below: I have added the missing comments, trying to elaborate more on the new metadata providers functioning. The paper cited above refers to them as compensation predicates/expressions. If I remember correctly, I have seen them referred to as differential predicates/expressions too. In the method description, the relation between condition, target, and residue is defined as: condition = target AND residue In the example you mention, we have condition: x = 1, target: x = 1 OR z = 3. Then the residue was: NOT (z = 3), which is not correct. x = 1 <=> (x = 1 OR z = 3) AND NOT(z = 3) x = 1 <=> (x = 1 AND NOT(z = 3)) OR (z = 3 AND NOT(z = 3)) x = 1 <=> (x = 1 AND NOT(z = 3)) (it does not hold) However, current result (residue: x = 1) is correct: x = 1 <=> (x = 1 OR z = 3) AND (x = 1) (it holds) Replaced qualifiedName in RelTableRef by list of String; I do not know why I was handling qualifiedName as a single String in the first place. Replaced RelTableRef.identifier by RelTableRef.entityNumber . Is it better? I have added comments for RexInputTableRef and RelTableRef stating their purpose. I also nested RelTableRef within RexInputTableRef , probably a better way of encapsulating them both. RexInputTableRef is useful in the context of provenance/lineage, thus it will not be used through the whole planning phase. In fact, its usage is currently constrained to the new rewriting rule. Any new rewriting rule/auxiliary method will be able to make use of these new classes, but expressions generated by rewriting rules for the planner should not contain them anymore. PR already contains new tests for SubstitutionVisitor.splitFilter which is the logic that was extended in SubstitutionVisitor ( splitOr was also changed but it is called from splitFilter ). Existing tests were also updated. In addition, new rewriting rule relies on splitFilter to generate compensation predicates, thus it is indirectly tested in MaterializationTest too. I have extended materialized_views.md .
          Hide
          julianhyde Julian Hyde added a comment -

          Also, can you expand https://github.com/apache/calcite/blob/master/site/_docs/materialized_views.md, which Michael Mior created a few days ago. I would like people (and ourselves) to understand the tools in this complex toolbox we are building, and when to use them.

          Show
          julianhyde Julian Hyde added a comment - Also, can you expand https://github.com/apache/calcite/blob/master/site/_docs/materialized_views.md , which Michael Mior created a few days ago. I would like people (and ourselves) to understand the tools in this complex toolbox we are building, and when to use them.
          Hide
          julianhyde Julian Hyde added a comment - - edited
          • Lots of blank comments. Can you fill them out. Be sure to use <p> paragraph markers.
          • Are you sure about the residue of x = 1 OR z = 3? (I get confused about residues. Is there a standard term for them?)
          • I worry about dots and quotes in qualified table names. Is the qualified table name in RelTableRef fully escaped? Use a list of strings instead?
          • RelTableRef.identifier is not a good name for an int field.
          • Are RelTableRef and RexInputTableRef likely to occur in all phases of planning, or just a particular phase? If RexInputTableRef can occur throughout planning it will be a huge burden. The javadoc needs to spell out its purpose and lifecycle much more clearly.
          • It seems that you have made several of the RexNode methods in SubstitutionVisitor smarter. Can they be unit tested?
          Show
          julianhyde Julian Hyde added a comment - - edited Lots of blank comments. Can you fill them out. Be sure to use <p> paragraph markers. Are you sure about the residue of x = 1 OR z = 3 ? (I get confused about residues. Is there a standard term for them?) I worry about dots and quotes in qualified table names. Is the qualified table name in RelTableRef fully escaped? Use a list of strings instead? RelTableRef.identifier is not a good name for an int field. Are RelTableRef and RexInputTableRef likely to occur in all phases of planning, or just a particular phase? If RexInputTableRef can occur throughout planning it will be a huge burden. The javadoc needs to spell out its purpose and lifecycle much more clearly. It seems that you have made several of the RexNode methods in SubstitutionVisitor smarter. Can they be unit tested?
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          PR is in https://github.com/apache/calcite/pull/414

          Julian Hyde, could you take a look at it? Also Maryann Xue and Michael Mior, it would be great if you could check it since you have worked with MVs in Calcite and you are pretty familiar with them. Core work is in new AbstractMaterializedViewRule.java rule.

          Michael Mior, I think this code supersedes the rule created in CALCITE-1389. Could you validate that? I tried to build on that rule, but finally approach was rather different, thus I decided it to rewrite it from scratch.

          If you feel any tests are missing or should be added, please let me know and I will add them. While review is in progress, I might also add some additional tests.

          Thanks

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - PR is in https://github.com/apache/calcite/pull/414 Julian Hyde , could you take a look at it? Also Maryann Xue and Michael Mior , it would be great if you could check it since you have worked with MVs in Calcite and you are pretty familiar with them. Core work is in new AbstractMaterializedViewRule.java rule. Michael Mior , I think this code supersedes the rule created in CALCITE-1389 . Could you validate that? I tried to build on that rule, but finally approach was rather different, thus I decided it to rewrite it from scratch. If you feel any tests are missing or should be added, please let me know and I will add them. While review is in progress, I might also add some additional tests. Thanks

            People

            • Assignee:
              jcamachorodriguez Jesus Camacho Rodriguez
              Reporter:
              jcamachorodriguez Jesus Camacho Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development