Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-263

Add operand type that will cause a rule to fire when a new subset is created

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      For trait propagation, it would be useful to be able to create a rule that is fired each time a relational expression (`RelNode`) is available with a set of traits (`RelTraitSet`) that it didn't previously have.

      Currently if you create an operand for an input to a relational expression, the rule will be fired every time a relational expression is added to that set (`RelSet`) or subset (`RelSubset`); if you do not create an operand, the rule will be fired once (basically, when the set is created); this would add something in the middle, firing the rule once for each subset. (A subset, you recall, is the combination of a set and a set of traits.)

      For instance, you could write a rule for sort-aggregation. If we are trying to evaluate `select ... group by x`, then the rule might fire once for `AggregateRel` on a subset sorted by [y], once for `AggregateRel` on a subset sorted by [x, y], once for an `AggregateRel` sorted on [x]. The firings for [x, y] and [x] could produce a sort-aggregation, because sort-aggregation requires the grouping key to be on the leading edge of the sort-key.

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/issues/263
      Created by: julianhyde
      Labels:
      Created at: Sat Apr 26 02:32:24 CEST 2014
      State: closed

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Wed May 21 08:19:15 CEST 2014, Author: julianhyde]

        Expanding on the example above. The rule would look like this:

        ```java
        class SortedAggregateRule {
        private SortedAggregateRule()

        { super( operand(AggregateRel.class, operand(RelSubset.class, any()))); }

        public void onMatch(RelOptRuleCall call)

        { final AggregateRel aggregate = call.rel(0); final RelSubset child = call.rel(1); ... }

        }
        ```

        The AggregateRel is created with a TableAccessRel as its child. The rule first fires matching [AggregateRel, the logical subset]. Then if, for some reason, a physical subset is created that is sorted on [x], then the rule will fire again. with [AggregateRel, physical subset]. That rule can then create a SortedAggregateRel. Here's the sequence:

        1. Initial state

        ```
        Set1 { LogicalSubset

        { AggregateRel } }
        |
        Set0 { LogicalSubset { TableAccessRel } }
        ```

        2. A physical implementation of TableAccessRel occurs with new traits:

        ```
        Set1 { LogicalSubset { AggregateRel }

        }

        Set0 { LogicalSubset

        {TableAccessRel } PhysicalSubset { something } }
        ```

        3. Rule fires and creates a SortedAggregateRel:

        ```
        Set1 { LogicalSubset { AggregateRel } PhysicalSubset { SortedAggregateRel } }
        | |
        Set0 { LogicalSubset {TableAccessRel }

        PhysicalSubset

        { something }

        }
        ```

        In short, this kind of rule allows interesting orderings (and other interesting traits) to bubble up.

        Show
        github-import GitHub Import added a comment - [Date: Wed May 21 08:19:15 CEST 2014, Author: julianhyde ] Expanding on the example above. The rule would look like this: ```java class SortedAggregateRule { private SortedAggregateRule() { super( operand(AggregateRel.class, operand(RelSubset.class, any()))); } public void onMatch(RelOptRuleCall call) { final AggregateRel aggregate = call.rel(0); final RelSubset child = call.rel(1); ... } } ``` The AggregateRel is created with a TableAccessRel as its child. The rule first fires matching [AggregateRel, the logical subset] . Then if, for some reason, a physical subset is created that is sorted on [x] , then the rule will fire again. with [AggregateRel, physical subset] . That rule can then create a SortedAggregateRel. Here's the sequence: 1. Initial state ``` Set1 { LogicalSubset { AggregateRel } } | Set0 { LogicalSubset { TableAccessRel } } ``` 2. A physical implementation of TableAccessRel occurs with new traits: ``` Set1 { LogicalSubset { AggregateRel } } Set0 { LogicalSubset {TableAccessRel } PhysicalSubset { something } } ``` 3. Rule fires and creates a SortedAggregateRel: ``` Set1 { LogicalSubset { AggregateRel } PhysicalSubset { SortedAggregateRel } } | | Set0 { LogicalSubset {TableAccessRel } PhysicalSubset { something } } ``` In short, this kind of rule allows interesting orderings (and other interesting traits) to bubble up.
        Hide
        github-import GitHub Import added a comment -

        [Date: Wed May 21 20:11:25 CEST 2014, Author: jpullokkaran]

        If i understood the design, trait would bubble up by repeated firing of the rules (by allowing rule to be fired on subset).

        My concern is about scalability; this logic doesn't capture which paths in the tree can take advantage of the trait (needs trait) and hence trait would traverse up in all the paths towards the root of the tree.

        Show
        github-import GitHub Import added a comment - [Date: Wed May 21 20:11:25 CEST 2014, Author: jpullokkaran ] If i understood the design, trait would bubble up by repeated firing of the rules (by allowing rule to be fired on subset). My concern is about scalability; this logic doesn't capture which paths in the tree can take advantage of the trait (needs trait) and hence trait would traverse up in all the paths towards the root of the tree.
        Hide
        github-import GitHub Import added a comment -

        [Date: Wed May 21 20:50:48 CEST 2014, Author: julianhyde]

        On May 21, 2014, at 11:11 AM, jpullokkaran <notifications@github.com> wrote:
        > If i understood the design, trait would bubble up by repeated firing of the rules (by allowing rule to be fired on subset).
        >
        It wouldn't happen completely automatically. You would need to write a rule for each operator type (e.g., in the case of Hive, a HiveFilterRule, HiveSortRule, HiveAggregateRule etc.) Probably HiveFilterRule already exists and has one operand [FilterRel]; you would need to make it two operands [FilterRel, RelSubset], so that it gets invoked each time there is a new sort order.
        > My concern is about scalability; this logic doesn't capture which paths in the tree can take advantage of the trait (needs trait) and hence trait would traverse up in all the paths towards the root of the tree.
        >
        My guess is that scalability wouldn't be a huge problem. Most relational expressions will have only one or two interesting orderings. And there is not a combinatorial explosion – if you add a new ordering to X, it will propagate to X's ancestors, and it will cease to propagate when those columns are removed or it hits a blocking operator. So, a new ordering will typically create a dozen new RelNodes.

        If it is a problem, we can revisit.

        Julian

        Show
        github-import GitHub Import added a comment - [Date: Wed May 21 20:50:48 CEST 2014, Author: julianhyde ] On May 21, 2014, at 11:11 AM, jpullokkaran <notifications@github.com> wrote: > If i understood the design, trait would bubble up by repeated firing of the rules (by allowing rule to be fired on subset). > It wouldn't happen completely automatically. You would need to write a rule for each operator type (e.g., in the case of Hive, a HiveFilterRule, HiveSortRule, HiveAggregateRule etc.) Probably HiveFilterRule already exists and has one operand [FilterRel] ; you would need to make it two operands [FilterRel, RelSubset] , so that it gets invoked each time there is a new sort order. > My concern is about scalability; this logic doesn't capture which paths in the tree can take advantage of the trait (needs trait) and hence trait would traverse up in all the paths towards the root of the tree. > My guess is that scalability wouldn't be a huge problem. Most relational expressions will have only one or two interesting orderings. And there is not a combinatorial explosion – if you add a new ordering to X, it will propagate to X's ancestors, and it will cease to propagate when those columns are removed or it hits a blocking operator. So, a new ordering will typically create a dozen new RelNodes. If it is a problem, we can revisit. Julian
        Hide
        github-import GitHub Import added a comment -

        [Date: Thu May 22 00:54:01 CEST 2014, Author: jinfengni]

        Although I have not tried this new code change yet, I feel it is quit close to what I was hoping for. The ability to fire a rule whenever a new RelSubSet is added would make it easier to bubble up the interesting traits.

        I'm also a bit concerned about the scalability. Not sure if we could hit the combinatorial problem, if we have more traits in the planner. Could we add a way to specify the set of interesting traits that a new RelSubset should have, before the rule is fired? For instance, we want to specify the new RelSubset is physical and has non-empty RelCollation, or maybe physical and has hash-distribution, etc. I see operand() could specify one trait that the operand should match. Can we specify multiple traits to match in the operand specification?

        Show
        github-import GitHub Import added a comment - [Date: Thu May 22 00:54:01 CEST 2014, Author: jinfengni ] Although I have not tried this new code change yet, I feel it is quit close to what I was hoping for. The ability to fire a rule whenever a new RelSubSet is added would make it easier to bubble up the interesting traits. I'm also a bit concerned about the scalability. Not sure if we could hit the combinatorial problem, if we have more traits in the planner. Could we add a way to specify the set of interesting traits that a new RelSubset should have, before the rule is fired? For instance, we want to specify the new RelSubset is physical and has non-empty RelCollation, or maybe physical and has hash-distribution, etc. I see operand() could specify one trait that the operand should match. Can we specify multiple traits to match in the operand specification?
        Hide
        github-import GitHub Import added a comment -

        [Date: Thu May 22 00:58:16 CEST 2014, Author: julianhyde]

        You can write your rule so that it does nothing if the traits are not "interesting" to you.

        The easy way is to do this in your override of `RelOptRule.onMatch(RelOptRuleCall)`. But you can override `RelOptRuleOperand.matches(RelNode)`; a bit more effort, but it prevents the rule call from even being queued.

        Let's see how much people end up using this pattern. If it is used a lot, it might be useful to create a subclass of `RelOptRuleOperand` and a static method similar to `RelOptRule.operand` to create it.

        Show
        github-import GitHub Import added a comment - [Date: Thu May 22 00:58:16 CEST 2014, Author: julianhyde ] You can write your rule so that it does nothing if the traits are not "interesting" to you. The easy way is to do this in your override of `RelOptRule.onMatch(RelOptRuleCall)`. But you can override `RelOptRuleOperand.matches(RelNode)`; a bit more effort, but it prevents the rule call from even being queued. Let's see how much people end up using this pattern. If it is used a lot, it might be useful to create a subclass of `RelOptRuleOperand` and a static method similar to `RelOptRule.operand` to create it.

          People

          • Assignee:
            Unassigned
            Reporter:
            github-import GitHub Import
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development