Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1960

RelMdPredicates.getPredicates is slow if there are many equivalent columns

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.0
    • Component/s: core
    • Labels:
      None

      Description

      When inferring pulled up predicates, multiple mappings are generated to
      make sure equivalent expressions can be substitute. E.g., for an expression
      'a + b + c' and the following equivalences:

      a : {a, b}
      b : {a, b}
      c : {c, e}
      

      should generate:

      a + a + c
      a + a + e
      a + b + c
      a + b + e
      b + a + c
      b + a + e
      b + b + c
      b + b + e
      

      The mapping generation is a typical permutation generation process. However,
      the current code is not handling the permutation well, thus causing duplicated
      mappings.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.15.0 (2017-12-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/f1a002ea ; thanks for the PR, LeoWangLZ !
        Hide
        perid007 LeoWangLZ added a comment -

        Thanks

        Show
        perid007 LeoWangLZ added a comment - Thanks
        Hide
        julianhyde Julian Hyde added a comment -

        I agree, the running time is so bad without your fix that we can set timeout = 20 seconds and treat that as a failure. Test runs in 6 seconds after your fix. Reviewing and testing now, and will commit after 1.14.

        Show
        julianhyde Julian Hyde added a comment - I agree, the running time is so bad without your fix that we can set timeout = 20 seconds and treat that as a failure. Test runs in 6 seconds after your fix. Reviewing and testing now, and will commit after 1.14.
        Hide
        perid007 LeoWangLZ added a comment -

        Julian Hyde IC. I add a test, it run some seconds with my fix, but it run more than 15 min without my fix. I think it's a negative case. Please review again, Thanks

        Show
        perid007 LeoWangLZ added a comment - Julian Hyde IC. I add a test, it run some seconds with my fix, but it run more than 15 min without my fix. I think it's a negative case. Please review again, Thanks
        Hide
        julianhyde Julian Hyde added a comment -

        We need a test that will fail without your fix, succeed with it. It will detect if (say) someone accidentally reverts your change in future. We can't engineer a project this scale unless every feature has an automatic test. I hope you can understand that.

        Show
        julianhyde Julian Hyde added a comment - We need a test that will fail without your fix, succeed with it. It will detect if (say) someone accidentally reverts your change in future. We can't engineer a project this scale unless every feature has an automatic test. I hope you can understand that.
        Hide
        perid007 LeoWangLZ added a comment - - edited

        Julian Hyde Yes, the test is succeeful without my fix, but for the ExprItr it loop 5 time, only 4 times after my fix.
        the example is like:
        For the ExprItr,

        columns are [12,15]
        columnSets are [{3,12}, {6,15}]
        

        the loop times is the times of the permutation, so the loop times is 4(2 * 2), but now it's 5 times. and the test don't express it by result. it's my reason that it make test hardly and express it wrong. Sometimes it loop more and more times that it may take long long time.

        Show
        perid007 LeoWangLZ added a comment - - edited Julian Hyde Yes, the test is succeeful without my fix, but for the ExprItr it loop 5 time, only 4 times after my fix. the example is like: For the ExprItr, columns are [12,15] columnSets are [{3,12}, {6,15}] the loop times is the times of the permutation, so the loop times is 4(2 * 2), but now it's 5 times. and the test don't express it by result. it's my reason that it make test hardly and express it wrong. Sometimes it loop more and more times that it may take long long time.
        Hide
        julianhyde Julian Hyde added a comment -

        The test you added in https://github.com/apache/calcite/commit/ff1b934 succeeds even without your fix.

        Show
        julianhyde Julian Hyde added a comment - The test you added in https://github.com/apache/calcite/commit/ff1b934 succeeds even without your fix.
        Hide
        perid007 LeoWangLZ added a comment -

        Julian Hyde Thanks for your suggestion, I have added test in RelMetadataTest.java. Please review it again, Thanks.

        Show
        perid007 LeoWangLZ added a comment - Julian Hyde Thanks for your suggestion, I have added test in RelMetadataTest.java. Please review it again, Thanks.
        Hide
        julianhyde Julian Hyde added a comment -

        Rather than making ExprsItr public, can you reproduce this on tests with particular SQL? RelMetadataTest.testPullUpPredicatesFromAggregation is an example that you might follow.

        Show
        julianhyde Julian Hyde added a comment - Rather than making ExprsItr public, can you reproduce this on tests with particular SQL? RelMetadataTest.testPullUpPredicatesFromAggregation is an example that you might follow.
        Hide
        perid007 LeoWangLZ added a comment -

        https://github.com/apache/calcite/pull/530 Julian Hyde
        The iterator generate more result in RelMdPredicates#ExprsItr.
        When compute next mapping, if it has no more element(t < 0) and the level is not 0, then it go next level recursively, and the current level need restart. now it reset iterationIdx to 0, when it will go the next level recursively, the upper level is still the last element instead of the first one. So when next, we restart set nextMapping and iterationIdx like initializeMapping, or it go the next level recursively, the upper level is still the last element instead of the first one.
        The class ExprsItr is not public, so it make test cases hardly. We can cast ExprsItr to public or don't make test cases.

        Show
        perid007 LeoWangLZ added a comment - https://github.com/apache/calcite/pull/530 Julian Hyde The iterator generate more result in RelMdPredicates#ExprsItr. When compute next mapping, if it has no more element(t < 0) and the level is not 0, then it go next level recursively, and the current level need restart. now it reset iterationIdx to 0, when it will go the next level recursively, the upper level is still the last element instead of the first one. So when next, we restart set nextMapping and iterationIdx like initializeMapping, or it go the next level recursively, the upper level is still the last element instead of the first one. The class ExprsItr is not public, so it make test cases hardly. We can cast ExprsItr to public or don't make test cases.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            perid007 LeoWangLZ
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development