Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-783

Infer collation of Project using monotonicity

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0-incubating
    • Component/s: core
    • Labels:
      None

      Description

      Current implementation of RelMdCollation#project doesn't handle function expressions and because of that we loose any collation metadata related to monotonic expressions which can be useful for generating streaming query execution plans involving tumbling windows.

      Following is how current code looks like (RelMdCollation):

      185: for (Ord<RexNode> project : Ord.zip(projects)) {
      186: if (project.e instanceof RexInputRef)

      { 187: targets.put(((RexInputRef) project.e).getIndex(), project.i); 188: }


      189: }

      We only handle projects of type RexInputRef. But to support monotonic expressions we should laso handle projects of type RexCall.

      1. CALCITE-783-0.patch
        2 kB
        Milinda Lakmal Pathirage
      2. CALCITE-783-1.patch
        56 kB
        Milinda Lakmal Pathirage

        Issue Links

          Activity

          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          This patch doesn't handle all the corner cases. But created this to get Calcite community's feedback to know whether this approach is correct or not.

          Show
          milinda Milinda Lakmal Pathirage added a comment - This patch doesn't handle all the corner cases. But created this to get Calcite community's feedback to know whether this approach is correct or not.
          Hide
          julianhyde Julian Hyde added a comment -

          Yes, this is a problem. The "TODO: also monotonic expressions" comment in the method you patched acknowledges this.

          However I wouldn't put the logic into RelMdCollation as you have done. It needs to be possible to extend the logic when new functions (possibly user-defined) are added. We have a good start in the SqlOperator method

            public SqlMonotonicity getMonotonicity(
                SqlCall call,
                SqlValidatorScope scope) {
              return SqlMonotonicity.NOT_MONOTONIC;
            }
          

          which is used by the validator and has overrides for functions including EXTRACT, CAST, FLOOR, SUBSTRING, +, all of which can be monotonic in some circumstances. SqlValidatorTest.testMonotonic has quite a long list of test cases. However that method only works on a SqlCall (i.e. a parse tree) whereas we need to apply the same logic to a RexNode.

          So I think the solution is to change the first parameter of that method from SqlCall to SqlOperatorBinding. SqlCallBinding is an interface which has an implementation that wraps a SqlCall, and another implementation that wraps a RexCall.

          I have made a start on this in my branch https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity. Milinda Lakmal Pathirage, could you finish RelMdCollation.project?

          Show
          julianhyde Julian Hyde added a comment - Yes, this is a problem. The "TODO: also monotonic expressions" comment in the method you patched acknowledges this. However I wouldn't put the logic into RelMdCollation as you have done. It needs to be possible to extend the logic when new functions (possibly user-defined) are added. We have a good start in the SqlOperator method public SqlMonotonicity getMonotonicity( SqlCall call, SqlValidatorScope scope) { return SqlMonotonicity.NOT_MONOTONIC; } which is used by the validator and has overrides for functions including EXTRACT, CAST, FLOOR, SUBSTRING, +, all of which can be monotonic in some circumstances. SqlValidatorTest.testMonotonic has quite a long list of test cases. However that method only works on a SqlCall (i.e. a parse tree) whereas we need to apply the same logic to a RexNode. So I think the solution is to change the first parameter of that method from SqlCall to SqlOperatorBinding. SqlCallBinding is an interface which has an implementation that wraps a SqlCall, and another implementation that wraps a RexCall. I have made a start on this in my branch https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity . Milinda Lakmal Pathirage , could you finish RelMdCollation.project?
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Julian Hyde, I'll finish the RelMdCollaiton.project. Thanks for looking into this.

          Show
          milinda Milinda Lakmal Pathirage added a comment - Julian Hyde , I'll finish the RelMdCollaiton.project. Thanks for looking into this.
          Hide
          milinda Milinda Lakmal Pathirage added a comment - - edited

          Hi Julian Hyde,

          I changed RelMdCollation and RellCallBinding to get monotonicity metadata to aggregates. But after these changes several tests listed below failed. I think all failures are caused by a single assertion in VolcanoPlanner (org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:693)). toTraits.allSimple() returns false due to the composite trait created when input's traits are used to infer LogicalAggregate's trait set.

          But I'm not sure what is the proper way to fix this. Can you please have a look at this patch.

          Show
          milinda Milinda Lakmal Pathirage added a comment - - edited Hi Julian Hyde , I changed RelMdCollation and RellCallBinding to get monotonicity metadata to aggregates. But after these changes several tests listed below failed. I think all failures are caused by a single assertion in VolcanoPlanner (org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:693)). toTraits.allSimple() returns false due to the composite trait created when input's traits are used to infer LogicalAggregate's trait set. But I'm not sure what is the proper way to fix this. Can you please have a look at this patch.
          Hide
          julianhyde Julian Hyde added a comment -

          Will do. By the way, there are a lot of spurious formatting changes in your patch. You'll need to back these out. I'll keep your changes separate from mine so it should be fairly easy for you to back out the formatting changes.

          Show
          julianhyde Julian Hyde added a comment - Will do. By the way, there are a lot of spurious formatting changes in your patch. You'll need to back these out. I'll keep your changes separate from mine so it should be fairly easy for you to back out the formatting changes.
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Hi Julian Hyde, Will attach a new patch after fixing formatting changes.

          Show
          milinda Milinda Lakmal Pathirage added a comment - Hi Julian Hyde , Will attach a new patch after fixing formatting changes.
          Hide
          julianhyde Julian Hyde added a comment -

          I've checked into https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity. The tests all pass, but there are errors generating javadoc due to introduced '<p/>'s. Can you please rebase to latest master, revert the formatting changes, and send a pull request.

          Show
          julianhyde Julian Hyde added a comment - I've checked into https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity . The tests all pass, but there are errors generating javadoc due to introduced '<p/>'s. Can you please rebase to latest master, revert the formatting changes, and send a pull request.
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Hi Julian Hyde, Did you mean rebasing on top of the master branch of https://github.com/apache/incubator-calcite with my changes and your changes from https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity.

          Show
          milinda Milinda Lakmal Pathirage added a comment - Hi Julian Hyde , Did you mean rebasing on top of the master branch of https://github.com/apache/incubator-calcite with my changes and your changes from https://github.com/julianhyde/incubator-calcite/tree/783-monotonicity .
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Yes, rebase on top of master.

          git checkout master
          git pull --rebase
          git checkout 783-monotonicity
          # edit to remove whitespace changes
          git commit -a
          git rebase -i master
          

          During that last rebase, you can re-order and squash commits. In particular you can squash the last commit (removing whitespace changes) into your previous commit that introduced them.

          Show
          julianhyde Julian Hyde added a comment - - edited Yes, rebase on top of master. git checkout master git pull --rebase git checkout 783-monotonicity # edit to remove whitespace changes git commit -a git rebase -i master During that last rebase, you can re-order and squash commits. In particular you can squash the last commit (removing whitespace changes) into your previous commit that introduced them.
          Hide
          julianhyde Julian Hyde added a comment -

          Milinda Lakmal Pathirage, do you think you can get this patch complete by Wednesday (7/22) as we are stabilizing for a 1.4 release?

          Show
          julianhyde Julian Hyde added a comment - Milinda Lakmal Pathirage , do you think you can get this patch complete by Wednesday (7/22) as we are stabilizing for a 1.4 release?
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Hi Julian Hyde, I opened this [1] pull request 11 days ago, but forgot to update the jira. I removed all the extra spaces added in the original patch and all tests ran successfully with your changes. Please let me know if anything is wrong with the pull request.

          [1] https://github.com/apache/incubator-calcite/pull/104

          Show
          milinda Milinda Lakmal Pathirage added a comment - Hi Julian Hyde , I opened this [1] pull request 11 days ago, but forgot to update the jira. I removed all the extra spaces added in the original patch and all tests ran successfully with your changes. Please let me know if anything is wrong with the pull request. [1] https://github.com/apache/incubator-calcite/pull/104
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/c711fed6 and http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/9177063b . Thanks for the fix!
          Hide
          maryannxue Maryann Xue added a comment -

          Hi Julian Hyde and Milinda Lakmal Pathirage, I need to understand the change to LogicalAggregate. So why does it return the collation of its child. Suppose we have a table t(pk, c1, c2) ordered on pk, and we group by c1, and most likely we'd use hash group-by or sort group-by, but neither will preserve the original pk order. And specifying a certain collation trait in the Logical rels would actually mean enforcing corresponding user rels to have the same collation and enforcing their implementations. With the existing collation traits of any other Logical rel, they are either required by spec (like LogicalSort) or logically derivable (like LogicalProject), but this one is neither.

          Show
          maryannxue Maryann Xue added a comment - Hi Julian Hyde and Milinda Lakmal Pathirage , I need to understand the change to LogicalAggregate. So why does it return the collation of its child. Suppose we have a table t(pk, c1, c2) ordered on pk, and we group by c1, and most likely we'd use hash group-by or sort group-by, but neither will preserve the original pk order. And specifying a certain collation trait in the Logical rels would actually mean enforcing corresponding user rels to have the same collation and enforcing their implementations. With the existing collation traits of any other Logical rel, they are either required by spec (like LogicalSort) or logically derivable (like LogicalProject), but this one is neither.
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Hi Maryann Xue, I think you are right regarding this. It is my fault that I blindly use LogicalAggregate's input's collation to infer aggregate's collation. As you have mentioned inferring aggregate's collation is not that straight forward. For tumbling windows based on group by to work, we just need to properly infer project's trait set based on monotonicity metadata. I'll create a JIRA ticket for this. Thanks for pointing this.

          Show
          milinda Milinda Lakmal Pathirage added a comment - Hi Maryann Xue , I think you are right regarding this. It is my fault that I blindly use LogicalAggregate's input's collation to infer aggregate's collation. As you have mentioned inferring aggregate's collation is not that straight forward. For tumbling windows based on group by to work, we just need to properly infer project's trait set based on monotonicity metadata. I'll create a JIRA ticket for this. Thanks for pointing this.
          Hide
          maryannxue Maryann Xue added a comment -

          Thanks a lot for your reply, Milinda Lakmal Pathirage! It think LogicalAggregate as a general representation should not have any inferred collation, since its underlying implementation really depends. But we have similar use cases in Phoenix regarding a rel's inferred collation, and we do it in the corresponding implementation rel classes, like our hash-join implementation of LogicalJoin is called PhoenixServerJoin and it takes its first child rel's collation as its own collation. Maybe you can do it this way as well?

          Show
          maryannxue Maryann Xue added a comment - Thanks a lot for your reply, Milinda Lakmal Pathirage ! It think LogicalAggregate as a general representation should not have any inferred collation, since its underlying implementation really depends. But we have similar use cases in Phoenix regarding a rel's inferred collation, and we do it in the corresponding implementation rel classes, like our hash-join implementation of LogicalJoin is called PhoenixServerJoin and it takes its first child rel's collation as its own collation. Maybe you can do it this way as well?
          Hide
          milinda Milinda Lakmal Pathirage added a comment -

          Maryann Xue, We just need the first child rel's collation and we actually don't need aggregate rel's collation for group by. I was wrong when I thought we need to infer aggregate rel's collation.

          Show
          milinda Milinda Lakmal Pathirage added a comment - Maryann Xue , We just need the first child rel's collation and we actually don't need aggregate rel's collation for group by. I was wrong when I thought we need to infer aggregate rel's collation.
          Hide
          jnadeau Jacques Nadeau added a comment -

          Resolved in release 1.4.0-incubating (2015-08-23)

          Show
          jnadeau Jacques Nadeau added a comment - Resolved in release 1.4.0-incubating (2015-08-23)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              milinda Milinda Lakmal Pathirage
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development