Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1116

Extend simplify for reducing expressions

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: core
    • Labels:
      None

      Description

      We would like to cover more cases in expression simplification, such as:

      x>5 and x is not null => x>5
      x>5 and x is null => not satisfiable
      x>5 and x<=5 => not satisfiable

        Issue Links

          Activity

          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

          Pull request in : https://github.com/apache/calcite/pull/202

          Julian Hyde, as you have worked recently on the simplify methods, could you take a look? Thanks

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Pull request in : https://github.com/apache/calcite/pull/202 Julian Hyde , as you have worked recently on the simplify methods, could you take a look? Thanks
          Hide
          julianhyde Julian Hyde added a comment - - edited

          In addition to (or perhaps instead of) the tests you added to RelOptRulesTest, can you add tests to RexProgramTest.testSimplify? These tests are more specific and easier to write and debug than RelOptRulesTest tests.

          Can you review null semantics carefully? "a > 5" is not equivalent to "NOT(a <= 5)" when there are nulls involved. There's not an easy solution. Usually boolean expressions have an implicit "... IS TRUE" so it doesn't matter, but it's hard to be sure you're in that situation. NullSafeVisitor may help, by making null semantics explicit; after NullSafeVisitor has made a pass, a would be converted to CAST(a AS ... NOT NULL) and therefore you can see that nulls are not in play.

          Do the HashSets in simplifyAnd2 make the behavior non-deterministic? (How ironic.) For instance, you iterate over notNullOperands. There's LinkedHashSet but also consider using the humble ArrayList (it's cheap). I don't know whether you expect to see term lists of a size where O( n ) contains would be a problem.

          Since RexNode doesn't override equals and hashCode maybe you should be building sets of strings.

          Your RexUtil.negate(RexCall) method does the same thing as NullSafeVisitor.negate(RexCall call) that I just added. Can you please combine. (Your negate method returns null in theory but not in practice. It should stop being wimpy and just throw.)

          Similarly, your RexUtil.invert method should leverage SqlKind.reverse or RexImplicationChecker.InputUsageFinder.reverse.

          On a couple of occasions you write termsSet.containsAll(new HashSet(collection)) where you could write termsSet.containsAll(collection).

          Show
          julianhyde Julian Hyde added a comment - - edited In addition to (or perhaps instead of) the tests you added to RelOptRulesTest, can you add tests to RexProgramTest.testSimplify? These tests are more specific and easier to write and debug than RelOptRulesTest tests. Can you review null semantics carefully? "a > 5" is not equivalent to "NOT(a <= 5)" when there are nulls involved. There's not an easy solution. Usually boolean expressions have an implicit "... IS TRUE" so it doesn't matter, but it's hard to be sure you're in that situation. NullSafeVisitor may help, by making null semantics explicit; after NullSafeVisitor has made a pass, a would be converted to CAST(a AS ... NOT NULL) and therefore you can see that nulls are not in play. Do the HashSets in simplifyAnd2 make the behavior non-deterministic? (How ironic.) For instance, you iterate over notNullOperands. There's LinkedHashSet but also consider using the humble ArrayList (it's cheap). I don't know whether you expect to see term lists of a size where O( n ) contains would be a problem. Since RexNode doesn't override equals and hashCode maybe you should be building sets of strings. Your RexUtil.negate(RexCall) method does the same thing as NullSafeVisitor.negate(RexCall call) that I just added. Can you please combine. (Your negate method returns null in theory but not in practice. It should stop being wimpy and just throw.) Similarly, your RexUtil.invert method should leverage SqlKind.reverse or RexImplicationChecker.InputUsageFinder.reverse . On a couple of occasions you write termsSet.containsAll(new HashSet(collection)) where you could write termsSet.containsAll(collection) .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Thanks Julian Hyde for your comments. I did not had time to get back to this one before, but I will check them in detail and address them (and Vladimir Sitnikov's) by Monday so it is included in 1.7.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Thanks Julian Hyde for your comments. I did not had time to get back to this one before, but I will check them in detail and address them (and Vladimir Sitnikov 's) by Monday so it is included in 1.7.
          Hide
          julianhyde Julian Hyde added a comment -

          Take your time... the departure of the 1.7 bus is not imminent.

          Show
          julianhyde Julian Hyde added a comment - Take your time... the departure of the 1.7 bus is not imminent.
          Hide
          julianhyde Julian Hyde added a comment -

          Jesus Camacho Rodriguez, Can you get this done by Wednesday?

          Show
          julianhyde Julian Hyde added a comment - Jesus Camacho Rodriguez , Can you get this done by Wednesday?
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          No problem, I will.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - No problem, I will.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, Vladimir Sitnikov, created a new pull request addressing all your comments.

          About null semantics, indeed the comment in the source code of the previous pull request was misleading, but the logic was correct: for instance, the expression a>5 AND a<=5 will always return false.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , Vladimir Sitnikov , created a new pull request addressing all your comments. About null semantics, indeed the comment in the source code of the previous pull request was misleading, but the logic was correct: for instance, the expression a>5 AND a<=5 will always return false.
          Hide
          julianhyde Julian Hyde added a comment -

          No it won't. If a is null then a > 5 AND a <= 5 evaluates to UNKNOWN.

          Show
          julianhyde Julian Hyde added a comment - No it won't. If a is null then a > 5 AND a <= 5 evaluates to UNKNOWN.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

          True, sorry about that, I was thinking about solely about expressions in Filter operators and I got confused...

          Is it safe to transform UNKNOWN to FALSE if the expression above is part of a Filter condition (as in both cases they will be filtered out)? Or there might be some side-effects I am not taking into account?

          I updated the pull request with a code path for simplifying expressions in Filter operators, and a different code path for simplifying other expressions.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited True, sorry about that, I was thinking about solely about expressions in Filter operators and I got confused... Is it safe to transform UNKNOWN to FALSE if the expression above is part of a Filter condition (as in both cases they will be filtered out)? Or there might be some side-effects I am not taking into account? I updated the pull request with a code path for simplifying expressions in Filter operators, and a different code path for simplifying other expressions.
          Hide
          julianhyde Julian Hyde added a comment -

          I renamed "filterCondition" to "unknownAsFalse" to make the semantics explicit.

          I still have doubts about how we approach null semantics. I think it's very difficult to catch every last bug. For example, it doesn't seem valid to pass unknownAsFalse unchanged into simplifyNot. If you have WHERE NOT (a < 10 AND a >= 10), WHERE will treat an unknown result of NOT (a < 10 AND a >= 10) as false, but that would mean treating an unknown result of a < 10 AND a >= 10 as true. I think there are similar issues in CASE. I don't propose that you or we fix all of these problems; we should just spray-paint "architecturally unsafe" on the whole thing and start looking for a better solution.

          With these reservations, patch looks good, and I will commit shortly.

          Show
          julianhyde Julian Hyde added a comment - I renamed "filterCondition" to "unknownAsFalse" to make the semantics explicit. I still have doubts about how we approach null semantics. I think it's very difficult to catch every last bug. For example, it doesn't seem valid to pass unknownAsFalse unchanged into simplifyNot. If you have WHERE NOT (a < 10 AND a >= 10) , WHERE will treat an unknown result of NOT (a < 10 AND a >= 10) as false, but that would mean treating an unknown result of a < 10 AND a >= 10 as true. I think there are similar issues in CASE. I don't propose that you or we fix all of these problems; we should just spray-paint "architecturally unsafe" on the whole thing and start looking for a better solution. With these reservations, patch looks good, and I will commit shortly.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, thanks for the feedback. I agree, thus let's keep it on the safe side and pass only "unknownAsFalse" through AND operators; then for e.g. NOT or CASE, we will not pass the flag set to true. This will be safe, it will be already useful and cover all cases we are currently targeting in Hive. In future releases, we can revisit this issue if necessary, and extend simplify to cover more cases.

          I created a new pull request based on your commit (I saw you added some comments, plus the flag renamed to "unknownAsFalse"), but in the new pull request we only pass the parameter through AND operations. Could you please pick up that one? Thanks!

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , thanks for the feedback. I agree, thus let's keep it on the safe side and pass only "unknownAsFalse" through AND operators; then for e.g. NOT or CASE, we will not pass the flag set to true. This will be safe, it will be already useful and cover all cases we are currently targeting in Hive. In future releases, we can revisit this issue if necessary, and extend simplify to cover more cases. I created a new pull request based on your commit (I saw you added some comments, plus the flag renamed to "unknownAsFalse"), but in the new pull request we only pass the parameter through AND operations. Could you please pick up that one? Thanks!
          Hide
          julianhyde Julian Hyde added a comment -

          Jesus Camacho Rodriguez, Thanks, I'll amend the commit.

          Show
          julianhyde Julian Hyde added a comment - Jesus Camacho Rodriguez , Thanks, I'll amend the commit.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/3a54e54f .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Thanks Julian Hyde, awesome!

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Thanks Julian Hyde , awesome!
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.7.0 (2016-03-22).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.7.0 (2016-03-22).

            People

            • Assignee:
              jcamachorodriguez Jesus Camacho Rodriguez
              Reporter:
              jcamachorodriguez Jesus Camacho Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development