Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-606

Fix Trait Propagation and add test case

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      Per my email to the dev list, it seems like we need to have a Trait propagation test case. Some things also seem broken. Dev list email: http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/201503.mbox/browser

        Activity

        Show
        jnadeau Jacques Nadeau added a comment - Working branch is here: https://github.com/jacques-n/incubator-calcite/tree/CALCITE-606
        Hide
        julianhyde Julian Hyde added a comment -

        I've committed a fix for you to review, https://github.com/julianhyde/incubator-calcite/commit/fb203dc4b9aea89bfed839c22ae3e285044df400. Some changes to core calcite, but quite a few fixes to your test case also. See my commit comment for details.

        I'd like to get the changes to core calcite in ASAP. Can you review, clean up the test (or throw it away) and rebase onto latest master.

        Show
        julianhyde Julian Hyde added a comment - I've committed a fix for you to review, https://github.com/julianhyde/incubator-calcite/commit/fb203dc4b9aea89bfed839c22ae3e285044df400 . Some changes to core calcite, but quite a few fixes to your test case also. See my commit comment for details. I'd like to get the changes to core calcite in ASAP. Can you review, clean up the test (or throw it away) and rebase onto latest master.
        Hide
        jni Jinfeng Ni added a comment -

        I have two comments.

        1. Seems SortRemoveRule is not required, in order to have withoutHack() run successful.
        I commented out SortRemoveRule in RULES_NO_HACK, and withoutHack() is still successful.

        If the trait propagation works properly, then I think SortRemovalRule is not necessary; why do we have to insert SORT first, then rely on SortRemoveRule to remove the unnecessary SORT later on?

        2. If I understand correctly, the main reason that withoutHack() in the modified testcase is successful, is because of the following (line 156)

                @Override public Statistic getStatistic() {
                  return Statistics.of(100d, ImmutableList.<ImmutableBitSet>of(),
                      ImmutableList.of(COLLATION));
                }
        

        That is, the logical EnumerableTableScan now has COLLATION. That is different from the original testcase, where COLLATION is only known when logical TableScan is converted into physical Scan operator. Actually, that's the main reason why we would like to have trait propagation work : some traits, ( RelCollation , partition/distribution etc ) are not known until in Physical operators, and at the Physical planning stage, we want to have those traits propagated properly, such that there would be no unnecessary operator (SORT, or EXCHANGE) would be inserted into the plan.

        Actually, If I remove COLLATION from logical TableScan, then withoutHack() will fail with CanNotPlan exception, meaning RelCollation propagated is not coming from the Physical TableScan, but from Logical TableScan.

        Not sure whether Jacques thinks this patch meets his need or not.

        Show
        jni Jinfeng Ni added a comment - I have two comments. 1. Seems SortRemoveRule is not required, in order to have withoutHack() run successful. I commented out SortRemoveRule in RULES_NO_HACK, and withoutHack() is still successful. If the trait propagation works properly, then I think SortRemovalRule is not necessary; why do we have to insert SORT first, then rely on SortRemoveRule to remove the unnecessary SORT later on? 2. If I understand correctly, the main reason that withoutHack() in the modified testcase is successful, is because of the following (line 156) @Override public Statistic getStatistic() { return Statistics.of(100d, ImmutableList.<ImmutableBitSet>of(), ImmutableList.of(COLLATION)); } That is, the logical EnumerableTableScan now has COLLATION. That is different from the original testcase, where COLLATION is only known when logical TableScan is converted into physical Scan operator. Actually, that's the main reason why we would like to have trait propagation work : some traits, ( RelCollation , partition/distribution etc ) are not known until in Physical operators, and at the Physical planning stage, we want to have those traits propagated properly, such that there would be no unnecessary operator (SORT, or EXCHANGE) would be inserted into the plan. Actually, If I remove COLLATION from logical TableScan, then withoutHack() will fail with CanNotPlan exception, meaning RelCollation propagated is not coming from the Physical TableScan, but from Logical TableScan. Not sure whether Jacques thinks this patch meets his need or not.
        Hide
        julianhyde Julian Hyde added a comment -

        Re 1. Yes. Because collation is a trait Calcite can deduce the same thing via another path.

        Re 2. My patch is close to minimal. If specify getStatistic but if you don't use EnumerableTableScan.create, or if you don't override RelOptTable.unwrap, it will not be able to deduce collations for the table and therefore will still need the sort.

        Jacques Let me know ASAP whether this patch meets your needs. It is the only thing preventing me from moving to a release vote.

        Show
        julianhyde Julian Hyde added a comment - Re 1. Yes. Because collation is a trait Calcite can deduce the same thing via another path. Re 2. My patch is close to minimal. If specify getStatistic but if you don't use EnumerableTableScan.create, or if you don't override RelOptTable.unwrap, it will not be able to deduce collations for the table and therefore will still need the sort. Jacques Let me know ASAP whether this patch meets your needs. It is the only thing preventing me from moving to a release vote.
        Hide
        jni Jinfeng Ni added a comment -

        Re 2. In the testcase, collations are supposedly to kick in when logical SCAN is converted into physical SCAN. collations are not know in logical SCAN phase.

        Line :299
              call.transformTo(new PhysTable(rel.getCluster()));
        
        Line 383
            public PhysTable(RelOptCluster cluster) {
              super(cluster, cluster.traitSet().replace(PHYSICAL).replace(COLLATION));
        

        I think the assumption of knowing traits only in physical stage makes senses : in logical plan, we have

                   Aggregate
                      \
                      PROJECT
                         \
                        SCAN
        

        We know the collation of SCAN only In physical stage. If the table has the required collation, no SORT is required. If no collation exists, Calcite either generates a plan by inserting SORT and do a SORT-based aggregation, or generates a plan which uses hash-based aggregation. In that way, the logical aggregate could be converted into SORT-based AGG or hash-based AGG, depending on the collation of SCAN known in physical stage.

        PS. I also have getStatistics, and use unwrap(). The only change I made is to provide an empty collation list to logical SCAN.

               @Override public Statistic getStatistic() {
                  return Statistics.of(100d, ImmutableList.<ImmutableBitSet>of(),
                      ImmutableList.<RelCollation>of());    // empty collation list for logical SCAN. 
                      // ImmutableList.of(COLLATION));
         
        Show
        jni Jinfeng Ni added a comment - Re 2. In the testcase, collations are supposedly to kick in when logical SCAN is converted into physical SCAN. collations are not know in logical SCAN phase. Line :299 call.transformTo( new PhysTable(rel.getCluster())); Line 383 public PhysTable(RelOptCluster cluster) { super (cluster, cluster.traitSet().replace(PHYSICAL).replace(COLLATION)); I think the assumption of knowing traits only in physical stage makes senses : in logical plan, we have Aggregate \ PROJECT \ SCAN We know the collation of SCAN only In physical stage. If the table has the required collation, no SORT is required. If no collation exists, Calcite either generates a plan by inserting SORT and do a SORT-based aggregation, or generates a plan which uses hash-based aggregation. In that way, the logical aggregate could be converted into SORT-based AGG or hash-based AGG, depending on the collation of SCAN known in physical stage. PS. I also have getStatistics, and use unwrap(). The only change I made is to provide an empty collation list to logical SCAN. @Override public Statistic getStatistic() { return Statistics.of(100d, ImmutableList.<ImmutableBitSet>of(), ImmutableList.<RelCollation>of()); // empty collation list for logical SCAN. // ImmutableList.of(COLLATION));
        Hide
        jnadeau Jacques Nadeau added a comment -

        I think we need to delay this until the next release. I don't think your patch solves the problems. The goal is that a physical transformation provides a trait that should be then leveraged in another transformation. My example is simplistic on purpose. In that case one could move collation to be a logical trait of the original node and resolve the issues. However, the set of examples we're actually trying to solve do not allow this.

        Show
        jnadeau Jacques Nadeau added a comment - I think we need to delay this until the next release. I don't think your patch solves the problems. The goal is that a physical transformation provides a trait that should be then leveraged in another transformation. My example is simplistic on purpose. In that case one could move collation to be a logical trait of the original node and resolve the issues. However, the set of examples we're actually trying to solve do not allow this.
        Hide
        amansinha100 Aman Sinha added a comment -

        This patch won't fix the distribution trait propagation, right ? If I have a join condition on a1 = a2 followed by group-by on a1 and I have distributed the left input of the join on a1, then we would do another re-distribution for the group-by. This property is not known during logical planning, so I am not sure how we can accomplish this in the new code.

        Show
        amansinha100 Aman Sinha added a comment - This patch won't fix the distribution trait propagation, right ? If I have a join condition on a1 = a2 followed by group-by on a1 and I have distributed the left input of the join on a1, then we would do another re-distribution for the group-by. This property is not known during logical planning, so I am not sure how we can accomplish this in the new code.
        Hide
        julianhyde Julian Hyde added a comment -

        Aman Sinha, you might be right that it doesn't fix distribution trait propagation. Maybe the case you cite can be handled using RelMdPredicates (predicate inference) - if you are distributed on x and a predicate tells you that x = y then you are also distributed on y.

        There is more work to be done, but as I discussed in CALCITE-628, I think my fix is sound and we should commit it.

        Show
        julianhyde Julian Hyde added a comment - Aman Sinha , you might be right that it doesn't fix distribution trait propagation. Maybe the case you cite can be handled using RelMdPredicates (predicate inference) - if you are distributed on x and a predicate tells you that x = y then you are also distributed on y. There is more work to be done, but as I discussed in CALCITE-628 , I think my fix is sound and we should commit it.
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/b312031f and http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/3b55c35a .
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.2.0-incubating (2015-04-16)

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.2.0-incubating (2015-04-16)

          People

          • Assignee:
            jnadeau Jacques Nadeau
            Reporter:
            jnadeau Jacques Nadeau
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development