Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1812

Provide RelMetadataQuery from planner to rules and invalidate in transformTo

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13.0
    • Component/s: core
    • Labels:

      Description

      Currently rules create a needed RelMetadataQuery when matched (ie. RuleXxx.onMatch. Since rules are called often, on complex query this triggers a multiplication RMQ instances used. But each new instance carries a heavy price on performance, because is the RMQ instance that hosts the cache used to memoize various metadata lookups (like getEstimatedRows, getPulledUpPredicates, uniqueness and so on). From HIVE-16757 investigation, the performance impact can be 10x.

      The proposal is to have a RelMetadataQuery in the planner and pass it to the rules as a RelOptRuleCall property. Rules onMatch can make use of this instance. The rule call transformTo can handle invalidating the RMQ instance.

      If code knowingly invalidates metadata properties associated with a node by means other than transformTo, it should somehow invalidate the current RMQ.

      The RMQ invalidation can be made smarter, but my plan right now is to do the dumb thing and just toss it away. Will revisit this later, depending on measured impact.

        Issue Links

          Activity

          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.13.0 (2017-06-26).

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.13.0 (2017-06-26).
          Hide
          julianhyde Julian Hyde added a comment -

          Snapshots get pushed automatically these days. I'm not sure of the triggering schedule, but there's probably one there already.

          Show
          julianhyde Julian Hyde added a comment - Snapshots get pushed automatically these days. I'm not sure of the triggering schedule, but there's probably one there already.
          Hide
          rusanu Remus Rusanu added a comment -

          Thanks Julian! Can you let me know when you push a refreshed 1.13.0-SNAPSHOT to Maven repos so I can try a Hive ptest run? Thanks.

          Show
          rusanu Remus Rusanu added a comment - Thanks Julian! Can you let me know when you push a refreshed 1.13.0-SNAPSHOT to Maven repos so I can try a Hive ptest run? Thanks.
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/56d5261a. Thanks for the PR, Remus Rusanu!

          I moved the cached RelMetadataQuery from RelOptPlanner to RelOptCluster, so that it doesn't clash with lifecycle changes coming in CALCITE-1536.

          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/56d5261a . Thanks for the PR, Remus Rusanu ! I moved the cached RelMetadataQuery from RelOptPlanner to RelOptCluster, so that it doesn't clash with lifecycle changes coming in CALCITE-1536 .
          Hide
          rusanu Remus Rusanu added a comment -

          I have fixed the Javadoc and reverted the changes in deprecated methods.

          Show
          rusanu Remus Rusanu added a comment - I have fixed the Javadoc and reverted the changes in deprecated methods.
          Hide
          rusanu Remus Rusanu added a comment -

          Whatever structure will end up after CALCITE-1536 representing the next-to-last bullet "runtime state such as rule queues and equivalence sets" should probably anchor the RMQ. As it is now, I find the planner the best place. The RelOpRuleCall is way to short lived, in my tests I end up with +60k invocations and these cascade to millions of cache miss caused MD calls. I looked at placing it into the cluster, but then I see no clear way to pass it to the rule onMatch via the RelOptRuleCall, other than retrieving the cluster from the RelNode(s) in the bindings. Another option is to remove the planner/call from the equation and have the RMQ always be retrieved via RelNode.getCluster().getRMQ(). Do you find this more palatable than hosting it in the planner?

          Show
          rusanu Remus Rusanu added a comment - Whatever structure will end up after CALCITE-1536 representing the next-to-last bullet "runtime state such as rule queues and equivalence sets" should probably anchor the RMQ. As it is now, I find the planner the best place. The RelOpRuleCall is way to short lived, in my tests I end up with +60k invocations and these cascade to millions of cache miss caused MD calls. I looked at placing it into the cluster, but then I see no clear way to pass it to the rule onMatch via the RelOptRuleCall, other than retrieving the cluster from the RelNode(s) in the bindings. Another option is to remove the planner/call from the equation and have the RMQ always be retrieved via RelNode.getCluster().getRMQ(). Do you find this more palatable than hosting it in the planner?
          Hide
          julianhyde Julian Hyde added a comment -

          Also:

          • I see some javadoc errors; please check those
          • don't fix deprecated methods like Join.estimateJoinedRows; users of those methods should feel pain.
          Show
          julianhyde Julian Hyde added a comment - Also: I see some javadoc errors; please check those don't fix deprecated methods like Join.estimateJoinedRows; users of those methods should feel pain.
          Hide
          julianhyde Julian Hyde added a comment -

          This seems to conflict with CALCITE-1536, where we're want to put less state in the planner (because the planner is has a shorter lifespan than a RelNode). How great would be the harm if code that doesn't have a RelOptRuleCall available (i.e. does not have an RelOptRule.onMatch or a RelNode.convert on the stack) has to call RelMetadataQuery.instance()?

          That's what I had in might when we originally discussed adding getMetadataQuery to RelOptRuleCall.

          It's possible that getMetadataQuery could live in RelOptCluster. I haven't checked.

          Show
          julianhyde Julian Hyde added a comment - This seems to conflict with CALCITE-1536 , where we're want to put less state in the planner (because the planner is has a shorter lifespan than a RelNode). How great would be the harm if code that doesn't have a RelOptRuleCall available (i.e. does not have an RelOptRule.onMatch or a RelNode.convert on the stack) has to call RelMetadataQuery.instance()? That's what I had in might when we originally discussed adding getMetadataQuery to RelOptRuleCall. It's possible that getMetadataQuery could live in RelOptCluster. I haven't checked.
          Hide
          rusanu Remus Rusanu added a comment -

          Julian Hyde can you do a review on this? Thanks.

          Show
          rusanu Remus Rusanu added a comment - Julian Hyde can you do a review on this? Thanks.
          Hide
          rusanu Remus Rusanu added a comment -

          RelMetadataQuery instance is held by planner and carried by the call to the rule. Most places the RMQ was needed in onMatch, where the call is available, or on a method called form onMatch where it could easily be passed as parameter. LoptMultiJoin functions needed RMQ and I had to pass the parameter to several functions.
          Since RelNode can get the planner (via cluster) de-facto any reference to a RelNode can retrieve the needed RMQ. I used this in places where the call was not available. This includes also the deprecated functions like getRows.

          Show
          rusanu Remus Rusanu added a comment - RelMetadataQuery instance is held by planner and carried by the call to the rule. Most places the RMQ was needed in onMatch, where the call is available, or on a method called form onMatch where it could easily be passed as parameter. LoptMultiJoin functions needed RMQ and I had to pass the parameter to several functions. Since RelNode can get the planner (via cluster) de-facto any reference to a RelNode can retrieve the needed RMQ. I used this in places where the call was not available. This includes also the deprecated functions like getRows.
          Show
          rusanu Remus Rusanu added a comment - https://github.com/apache/calcite/pull/462

            People

            • Assignee:
              rusanu Remus Rusanu
              Reporter:
              rusanu Remus Rusanu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development