Derby
  1. Derby
  2. DERBY-4416

Handle comparison of two constants as a boolean constant

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.5.3.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      In the lack of the boolean data type, Derby forces you to use expressions like 1=1 and 1<>1 to express true and false. Generated SQL statements also tend to use such expressions, and so does Derby in its own meta-data queries.

      Derby has many useful optimizations for boolean true/false. For instance, ProjectRestrictNode and PredicateList are able to eliminate predicates, and in some cases the entire ProjectRestrictNode, if the predicate contains constant true or false values. However, during parsing and compilation, expressions like 1=1 are not rewritten to TRUE, and we don't get any benefit from the boolean optimization code. This leads to more complex, and possibly less efficient, byte code being generated for the statements.

      Also, boolean constants are assigned a selectivity of 0.0 (false) or 1.0 (true), since they will always match no rows when false and all rows when true. The expression 1=1 does however get it's selectivity from the = operator, which means that it'll be 0.1. The same selectivity is assigned to 1=0. Other operators have different selectivity, so 2<3 has the selectivity 0.33, even though the actual selectivity of the expression is the same as 1=1 and TRUE, namely 1.0.

      This leads to oddities like the optimizer choosing a different plan when you change 2<3 to 1=1 in a WHERE clause. See http://mail-archives.apache.org/mod_mbox/db-derby-user/200909.mbox/%3c25531166.post@talk.nabble.com%3e for an example of that.

      If we could go through the query tree and replace occurrences of comparisons between constant values with a boolean constant at bind time, such queries would end up with simpler byte code, and the selectivity passed to the optimizer would be more accurate, possibly resulting in a better plan being chosen.

      1. replaceExpressions.diff
        6 kB
        Knut Anders Hatlen
      2. d4416-1a.diff
        17 kB
        Knut Anders Hatlen
      3. d4416-1a.stat
        0.6 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Gavin made changes -
          Workflow jira [ 12480042 ] Default workflow, editable Closed status [ 12800823 ]
          Knut Anders Hatlen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Dag! I changed the wording to this in revision 880791:

          • @return a node representing a Boolean constant if the result of the
          • operator is known; otherwise, this operator node
          Show
          Knut Anders Hatlen added a comment - Thanks, Dag! I changed the wording to this in revision 880791: @return a node representing a Boolean constant if the result of the operator is known; otherwise, this operator node
          Hide
          Dag H. Wanvik added a comment -

          Thanks for this patch, Knut. The placement of accept for the new visitor in SMLStatementNode looks good.
          +1.

          Nit: in BinaryRelationalOperatorNode#evaluateConstantExpressions's
          Javadoc, the text in the @return tag is slightly misleading:

          @return a Boolean constant if the result of the operator is known, or
          the operator node otherwise

          One could be led to believe is is returning a Boolean object; I like better the phrasing you use for
          #newBool: "@return a node representing a Boolean constant".

          Show
          Dag H. Wanvik added a comment - Thanks for this patch, Knut. The placement of accept for the new visitor in SMLStatementNode looks good. +1. Nit: in BinaryRelationalOperatorNode#evaluateConstantExpressions's Javadoc, the text in the @return tag is slightly misleading: @return a Boolean constant if the result of the operator is known, or the operator node otherwise One could be led to believe is is returning a Boolean object; I like better the phrasing you use for #newBool: "@return a node representing a Boolean constant".
          Knut Anders Hatlen made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Issue & fix info [Patch Available]
          Fix Version/s 10.6.0.0 [ 12313727 ]
          Resolution Fixed [ 1 ]
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 880693.

          I'm marking the issue as resolved. If there are other expressions that we want to rewrite using this mechanism, we can file separate issues for those improvements.

          Show
          Knut Anders Hatlen added a comment - Committed revision 880693. I'm marking the issue as resolved. If there are other expressions that we want to rewrite using this mechanism, we can file separate issues for those improvements.
          Knut Anders Hatlen made changes -
          Issue & fix info [Patch Available]
          Hide
          Knut Anders Hatlen added a comment -

          All the regression tests ran cleanly.

          Show
          Knut Anders Hatlen added a comment - All the regression tests ran cleanly.
          Knut Anders Hatlen made changes -
          Attachment d4416-1a.diff [ 12424863 ]
          Attachment d4416-1a.stat [ 12424864 ]
          Hide
          Knut Anders Hatlen added a comment -

          Here's an updated patch with the visitor renamed to ConstantExpressionVisitor, the method in ValueNode renamed to evaluateConstantExpression(), and with updated comments.

          Description of the other changes from the previous patch:

          1) The invocation of the visitor is moved to DMLStatementNode.optimizeStatement(). The previous patch called it from SelectNode and JoinNode on WHERE/HAVING/ON clauses. As mentioned by Bryan, this transformation can be useful in other parts of the tree as well (as seen in VALUES 1+3, although evaluation of arithmetic operations hasn't implemented). I think it's cleaner to have the invocation at one single location. Also, since there might be sub-queries inside WHERE clauses, the old patch might traverse some parts of the tree many times.

          The visitor now walks the entire query tree between the preprocessing phase and the optimization phase. Doing it as late as possible is advantageous because then other simplifications (like the NOT elimination performed in preprocess()) increase the chances of finding expressions with a know value, but it should be performed before optimization so that the optimizer can take advantage of the more accurate selectivity predictions.

          2) JoinNode: Made acceptChildren() call accept() on joinPredicates. The previous patch visited the JoinNode a little earlier, when the ON clause was still represented by joinClause, whereas this patch visits the JoinNode after the predicates have been moved to joinPredicates. There are also other children of JoinNode that are not visited, but I didn't do anything with them for now.

          3) BinaryRelationalOperatorNode: Added rewriting for less-than and less-equals, which were forgotten in the previous patch.

          4) logop.out: Accept that a statement that used to fail with arithmetic overflow now succeeds because the arithmetic isn't performed at runtime. The statement that fails is
          > select x from s where 2147483647 + 10 = 2 and (1=2);
          whereas this almost identical one is expected to succeed
          > select x from s where (1=2) and 2147483647 + 10 = 2;
          I'm not aware of anything in the SQL standard that requires us to evaluate the arithmetic expression first, so I think it's equally OK to pass the first statement as it is to pass the second statement. The comments in the test also indicate this.

          4) specjPlans.out: Removed unneeded ProjectRestrictResultSet from two of the query plans (they were there to enforce the restriction 1=1).

          5) outerjoin.out: Updated query plan from nested loop join to hash join in a join with 1=1 in the ON clause. New plan looks OK and the results are still correct. The plan probably changed because 1=1 and TRUE have different (estimated) selectivity.

          I haven't run all the regression tests on the latest revision of the patch. Will report back when I have the results.

          Show
          Knut Anders Hatlen added a comment - Here's an updated patch with the visitor renamed to ConstantExpressionVisitor, the method in ValueNode renamed to evaluateConstantExpression(), and with updated comments. Description of the other changes from the previous patch: 1) The invocation of the visitor is moved to DMLStatementNode.optimizeStatement(). The previous patch called it from SelectNode and JoinNode on WHERE/HAVING/ON clauses. As mentioned by Bryan, this transformation can be useful in other parts of the tree as well (as seen in VALUES 1+3, although evaluation of arithmetic operations hasn't implemented). I think it's cleaner to have the invocation at one single location. Also, since there might be sub-queries inside WHERE clauses, the old patch might traverse some parts of the tree many times. The visitor now walks the entire query tree between the preprocessing phase and the optimization phase. Doing it as late as possible is advantageous because then other simplifications (like the NOT elimination performed in preprocess()) increase the chances of finding expressions with a know value, but it should be performed before optimization so that the optimizer can take advantage of the more accurate selectivity predictions. 2) JoinNode: Made acceptChildren() call accept() on joinPredicates. The previous patch visited the JoinNode a little earlier, when the ON clause was still represented by joinClause, whereas this patch visits the JoinNode after the predicates have been moved to joinPredicates. There are also other children of JoinNode that are not visited, but I didn't do anything with them for now. 3) BinaryRelationalOperatorNode: Added rewriting for less-than and less-equals, which were forgotten in the previous patch. 4) logop.out: Accept that a statement that used to fail with arithmetic overflow now succeeds because the arithmetic isn't performed at runtime. The statement that fails is > select x from s where 2147483647 + 10 = 2 and (1=2); whereas this almost identical one is expected to succeed > select x from s where (1=2) and 2147483647 + 10 = 2; I'm not aware of anything in the SQL standard that requires us to evaluate the arithmetic expression first, so I think it's equally OK to pass the first statement as it is to pass the second statement. The comments in the test also indicate this. 4) specjPlans.out: Removed unneeded ProjectRestrictResultSet from two of the query plans (they were there to enforce the restriction 1=1). 5) outerjoin.out: Updated query plan from nested loop join to hash join in a join with 1=1 in the ON clause. New plan looks OK and the results are still correct. The plan probably changed because 1=1 and TRUE have different (estimated) selectivity. I haven't run all the regression tests on the latest revision of the patch. Will report back when I have the results.
          Knut Anders Hatlen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Knut Anders Hatlen made changes -
          Assignee Knut Anders Hatlen [ knutanders ]
          Knut Anders Hatlen made changes -
          Link This issue is blocked by DERBY-4421 [ DERBY-4421 ]
          Hide
          Knut Anders Hatlen added a comment -

          Yes, that's the idea. Or 3+1<4 --> 4<4 ---> false. I'm not sure if we want to do it, but I thought that since it would be possible using the same mechanism, there's no reason to have "boolean" in the method/class names.

          Show
          Knut Anders Hatlen added a comment - Yes, that's the idea. Or 3+1<4 --> 4<4 ---> false. I'm not sure if we want to do it, but I thought that since it would be possible using the same mechanism, there's no reason to have "boolean" in the method/class names.
          Hide
          Bryan Pendleton added a comment -

          I'm not sure I understand your more-general idea.

          Would it mean that, e.g.,

          select 1+2 from t

          would be transformed into

          select 3 from t

          Show
          Bryan Pendleton added a comment - I'm not sure I understand your more-general idea. Would it mean that, e.g., select 1+2 from t would be transformed into select 3 from t
          Hide
          Knut Anders Hatlen added a comment -

          The visitor and the methods in the patch could probably have more general names, since they could be used to all sorts of expression-to-constant transformations.

          So instead of the ValueNode.getBooleanEquivalent() method in the current patch, we could have a ValueNode.getConstantEquivalent(). The override in BinaryRelationalOperatorNode would return a boolean constant like in the patch. In addition, possibly as part of separate issues later, overrides could be added to other nodes, like in BinaryArithmeticOperatorNode to return the result of arithmetic operations as a (non-boolean) constant.

          Currently, since the accept() methods call visit() on the parent node before the children, the patch is not able to fully rewrite expressions like ((1=1) = (3=3)) to true (well, this expression was probably disallowed by the recent DERBY-887 commit, but it will probably be allowed again when the boolean data type is reintroduced). The rewriting stops at (true = true). Making BinaryRelationalOperatorNode.getBooleanEquivalent() call getBooleanEquivalent() on the left operand and the right operand before checking whether they are constant nodes, should fix this, I think.

          Adding an override in IsNullNode to handle IS NULL/IS NOT NULL should be fairly trivial too, I think.

          Show
          Knut Anders Hatlen added a comment - The visitor and the methods in the patch could probably have more general names, since they could be used to all sorts of expression-to-constant transformations. So instead of the ValueNode.getBooleanEquivalent() method in the current patch, we could have a ValueNode.getConstantEquivalent(). The override in BinaryRelationalOperatorNode would return a boolean constant like in the patch. In addition, possibly as part of separate issues later, overrides could be added to other nodes, like in BinaryArithmeticOperatorNode to return the result of arithmetic operations as a (non-boolean) constant. Currently, since the accept() methods call visit() on the parent node before the children, the patch is not able to fully rewrite expressions like ((1=1) = (3=3)) to true (well, this expression was probably disallowed by the recent DERBY-887 commit, but it will probably be allowed again when the boolean data type is reintroduced). The rewriting stops at (true = true). Making BinaryRelationalOperatorNode.getBooleanEquivalent() call getBooleanEquivalent() on the left operand and the right operand before checking whether they are constant nodes, should fix this, I think. Adding an override in IsNullNode to handle IS NULL/IS NOT NULL should be fairly trivial too, I think.
          Knut Anders Hatlen made changes -
          Field Original Value New Value
          Attachment replaceExpressions.diff [ 12422800 ]
          Hide
          Knut Anders Hatlen added a comment -

          Attached is a patch (not for commit) that I've experimented with. It adds a visitor that replaces (some) expressions that are guaranteed to evaluate to true or false with a boolean constant node. This visitor is used on the WHERE clauses and HAVING clauses from SelectNode.normExpressions() and on the ON clauses in JoinNode.normExpressions().

          I've seen that for instance this statement

          SELECT * FROM T WHERE 1=1

          is translated into a ProjectRestrictResultSet on top of a table scan without the patch, and simply to a table scan with the patch.

          I tested it with suites.All and derbyall, and saw three failures:

          1) lang/outerjoin.sql: Changed query plans. Looks like the expected changes, in that some ProjectRestrictResultSets are eliminated, but will have to check to be sure.

          2) lang/specjPlans.sql: Changed query plans. These also look benign, but will have to check.

          3) lang/logop.sql: Query that previously failed now succeeds and returns no rows. Expected output is below, and judging by the comment in the test, it looks as if the empty result that was actually supposed to be the expected one. The reason why it stops failing, is that (1=2) is rewritten to false and PredicateList.restorePredicates() is therefore able eliminate the other half of the AND node (2147483647 + 10 = 2) that causes integer overflow.

          ij> – ... and false and ... should get resolved to false
          select x from s where 2147483647 + 10 = 2 and (1=2);
          ERROR 22003: The resulting value is outside the range for the data type INTEGER.

          Now:

          ij> – ... and false and ... should get resolved to false
          select x from s where 2147483647 + 10 = 2 and (1=2);
          X
          -----------

          Show
          Knut Anders Hatlen added a comment - Attached is a patch (not for commit) that I've experimented with. It adds a visitor that replaces (some) expressions that are guaranteed to evaluate to true or false with a boolean constant node. This visitor is used on the WHERE clauses and HAVING clauses from SelectNode.normExpressions() and on the ON clauses in JoinNode.normExpressions(). I've seen that for instance this statement SELECT * FROM T WHERE 1=1 is translated into a ProjectRestrictResultSet on top of a table scan without the patch, and simply to a table scan with the patch. I tested it with suites.All and derbyall, and saw three failures: 1) lang/outerjoin.sql: Changed query plans. Looks like the expected changes, in that some ProjectRestrictResultSets are eliminated, but will have to check to be sure. 2) lang/specjPlans.sql: Changed query plans. These also look benign, but will have to check. 3) lang/logop.sql: Query that previously failed now succeeds and returns no rows. Expected output is below, and judging by the comment in the test, it looks as if the empty result that was actually supposed to be the expected one. The reason why it stops failing, is that (1=2) is rewritten to false and PredicateList.restorePredicates() is therefore able eliminate the other half of the AND node (2147483647 + 10 = 2) that causes integer overflow. ij> – ... and false and ... should get resolved to false select x from s where 2147483647 + 10 = 2 and (1=2); ERROR 22003: The resulting value is outside the range for the data type INTEGER. Now: ij> – ... and false and ... should get resolved to false select x from s where 2147483647 + 10 = 2 and (1=2); X -----------
          Knut Anders Hatlen created issue -

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development