Derby
  1. Derby
  2. DERBY-4421

Allow Visitors to process the nodes bottom-up

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently, QueryTreeNode.accept() walks the tree top-down, always calling
      visit() on the parent before it calls visit() on the children. Although this
      is fine in most cases, there are use cases where visiting the nodes
      bottom-up would be better. One example is mentioned in DERBY-4416. The
      visitor posted there looks for binary comparison operators and checks
      whether both operands are constants. If they are, the operator is replaced
      with a boolean constant.

      Take this expression as an example: (1<2)=(2>1)

      The query tree looks like this:

      =
      / \
      / \
      < >
      / \ / \
      / \ / \
      1 2 2 1

      If we walk the tree top-down with the said visitor, the = node doesn't have
      constant operands when it's visited. The < and > operators do have constant
      operands, and they're both replaced with constant TRUE. This means the
      expression (1<2)=(2>1) is rewritten to TRUE=TRUE, and that's how far the
      transformation goes.

      If the tree had been processed bottom-up, we would start with the < and >
      operators, and again replace them with TRUE. The query tree would therefore
      have been transformed to this intermediate form when the = operator was
      visited:

      =
      / \
      / \
      TRUE TRUE

      This is the same as the end result when visiting top-down, but now the =
      operator hasn't been visited yet. Since both the operands of the = operator
      are constants, the visitor will perform yet another transformation so the
      tree is simplified further and ends up as:

      TRUE

      1. d4421-2a.stat
        2 kB
        Knut Anders Hatlen
      2. d4421-2a.diff
        16 kB
        Knut Anders Hatlen
      3. d4421-1a.stat
        3 kB
        Knut Anders Hatlen
      4. d4421-1a.diff
        40 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Knut Anders Hatlen created issue -
          Hide
          Knut Anders Hatlen added a comment -

          I propose that we add a method to the Visitor interface to control whether the visitor should visit parents or children first:

          /**

          • Method that is called to see if {@code visit()}

            should be called on

          • the children of {@code node} before it is called on {@code node}

            itself.

          • If this method always returns {@code true}, the visitor will walk the
            * tree bottom-up. If it always returns {@code false}, the tree is visited
            * top-down.
            *
            * @param node the top node of a sub-tree about to be visited
            * @return {@code true}

            if

            {@code node}'s children should be visited
            * before {@code node}

            ,

            {@code false}

            otherwise
            */
            boolean visitChildrenFirst(Visitable node);

          QueryTreeNode.accept() must be changed to check the return value of this method, and all the existing visitors must implement the method (should return false, since all the current visitors walk the tree top-down).

          Show
          Knut Anders Hatlen added a comment - I propose that we add a method to the Visitor interface to control whether the visitor should visit parents or children first: /** Method that is called to see if {@code visit()} should be called on the children of {@code node} before it is called on {@code node} itself. If this method always returns {@code true}, the visitor will walk the * tree bottom-up. If it always returns {@code false}, the tree is visited * top-down. * * @param node the top node of a sub-tree about to be visited * @return {@code true} if {@code node}'s children should be visited * before {@code node} , {@code false} otherwise */ boolean visitChildrenFirst(Visitable node); QueryTreeNode.accept() must be changed to check the return value of this method, and all the existing visitors must implement the method (should return false, since all the current visitors walk the tree top-down).
          Hide
          Rick Hillegas added a comment -

          +1 to this idea.

          Show
          Rick Hillegas added a comment - +1 to this idea.
          Hide
          Dag H. Wanvik added a comment -

          +1.

          Show
          Dag H. Wanvik added a comment - +1.
          Knut Anders Hatlen made changes -
          Field Original Value New Value
          Assignee Knut Anders Hatlen [ knutanders ]
          Knut Anders Hatlen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Knut Anders Hatlen added a comment -

          Attached is a patch with the following code changes:

          1) Adds a method visitChildrenFirst() to the Visitor interface and implements it as "return false" for all existing visitor classes.

          2) QueryTreeNode.accept() checks visitChildrenFirst() to see if visit() should be called on the parent or the children first.

          3) Makes QueryTreeNode.accept() final to prevent sub-classes from overriding it, as classes that override it would need to duplicate the logic. Duplicated code tends to be more difficult to maintain and more error-prone and should be avoided.

          4) Adds an empty acceptChildren() method to QueryTreeNode. This method is called by QueryTreeNode.accept(), and sub-classes should now override this method instead of accept().

          5) Replaces all accept() overrides in sub-classes of QueryTreeNode with acceptChildren().

          Most of these steps were just mechanical transformations. There's one exception from the rule: FromList.accept() was removed instead of replaced with an acceptChildren() method. That's because the old accept() method did the exact same thing as the parent's (QueryTreeNodeVector) accept() method did. This worked, and didn't cause nodes to be visited twice, because FromList.accept() didn't follow the established pattern and call super.accept(). There's however no reason to implement accept() or acceptChildren() in FromList, so I found it more consistent to remove it.

          The patch removes more code than it adds (215 lines added, 359 lines removed), so I think it's a good clean-up. Since the checking of Visitor.stopTraversal() has been centralized to QueryTreeNode.accept(), we could also remove the now redundant calls to stopTraversal() in all the acceptChildren() methods. But I'd prefer to do that in a follow-up patch in order to keep this patch smaller and easier to understand.

          The regression tests ran cleanly with an earlier version of the patch. The patch had to be refreshed because DERBY-3634 has added one new visitor and two new accept() methods after that. Rerunning regression tests now.

          I also tested the patch together with the patch from DERBY-4416. If the visitor is told to traverse the tree bottom-up, the patch is now able to optimize away an WHERE clause that contains (1=1)=(1=1), so that no project-restrict result set is generated for this statement:

          SELECT * FROM T WHERE (1=1)=(1=1)

          Without this patch, there would be a project-restrict result set on top of the table scan with a (TRUE=TRUE) restriction.

          Show
          Knut Anders Hatlen added a comment - Attached is a patch with the following code changes: 1) Adds a method visitChildrenFirst() to the Visitor interface and implements it as "return false" for all existing visitor classes. 2) QueryTreeNode.accept() checks visitChildrenFirst() to see if visit() should be called on the parent or the children first. 3) Makes QueryTreeNode.accept() final to prevent sub-classes from overriding it, as classes that override it would need to duplicate the logic. Duplicated code tends to be more difficult to maintain and more error-prone and should be avoided. 4) Adds an empty acceptChildren() method to QueryTreeNode. This method is called by QueryTreeNode.accept(), and sub-classes should now override this method instead of accept(). 5) Replaces all accept() overrides in sub-classes of QueryTreeNode with acceptChildren(). Most of these steps were just mechanical transformations. There's one exception from the rule: FromList.accept() was removed instead of replaced with an acceptChildren() method. That's because the old accept() method did the exact same thing as the parent's (QueryTreeNodeVector) accept() method did. This worked, and didn't cause nodes to be visited twice, because FromList.accept() didn't follow the established pattern and call super.accept(). There's however no reason to implement accept() or acceptChildren() in FromList, so I found it more consistent to remove it. The patch removes more code than it adds (215 lines added, 359 lines removed), so I think it's a good clean-up. Since the checking of Visitor.stopTraversal() has been centralized to QueryTreeNode.accept(), we could also remove the now redundant calls to stopTraversal() in all the acceptChildren() methods. But I'd prefer to do that in a follow-up patch in order to keep this patch smaller and easier to understand. The regression tests ran cleanly with an earlier version of the patch. The patch had to be refreshed because DERBY-3634 has added one new visitor and two new accept() methods after that. Rerunning regression tests now. I also tested the patch together with the patch from DERBY-4416 . If the visitor is told to traverse the tree bottom-up, the patch is now able to optimize away an WHERE clause that contains (1=1)=(1=1), so that no project-restrict result set is generated for this statement: SELECT * FROM T WHERE (1=1)=(1=1) Without this patch, there would be a project-restrict result set on top of the table scan with a (TRUE=TRUE) restriction.
          Knut Anders Hatlen made changes -
          Attachment d4421-1a.diff [ 12423194 ]
          Attachment d4421-1a.stat [ 12423195 ]
          Hide
          Rick Hillegas added a comment -

          Thanks for the patch, Knut. Looks good. A couple comments:

          1) Why does acceptChildren() have package access rather than public or protected access?

          2) As a follow-on effort, it might be good to make QueryTreeNode.accept() abstract in order to force node designers to think about how new nodes should be walked.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Thanks for the patch, Knut. Looks good. A couple comments: 1) Why does acceptChildren() have package access rather than public or protected access? 2) As a follow-on effort, it might be good to make QueryTreeNode.accept() abstract in order to force node designers to think about how new nodes should be walked. Thanks, -Rick
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Rick. My responses to your questions
          follow below.

          > 1) Why does acceptChildren() have package access rather than public
          > or protected access?

          The reason why it's not public is that it's only meant to be called
          from QueryTreeNode and its subclasses. Others should access it
          indirectly via the methods in the Visitable interface.

          Protected access is more liberal than package access and would allow
          the method to be overridden by sub-classes in other packages, but
          since all the sub-classes of QueryTreeNode are in the same package, I
          chose the stricter one.

          > 2) As a follow-on effort, it might be good to make
          > QueryTreeNode.accept() abstract in order to force node designers to
          > think about how new nodes should be walked.

          I'm not sure I follow you. Wouldn't that mean that the logic in
          accept() would have to be duplicated in each sub-class of
          QueryTreeNode?

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Rick. My responses to your questions follow below. > 1) Why does acceptChildren() have package access rather than public > or protected access? The reason why it's not public is that it's only meant to be called from QueryTreeNode and its subclasses. Others should access it indirectly via the methods in the Visitable interface. Protected access is more liberal than package access and would allow the method to be overridden by sub-classes in other packages, but since all the sub-classes of QueryTreeNode are in the same package, I chose the stricter one. > 2) As a follow-on effort, it might be good to make > QueryTreeNode.accept() abstract in order to force node designers to > think about how new nodes should be walked. I'm not sure I follow you. Wouldn't that mean that the logic in accept() would have to be duplicated in each sub-class of QueryTreeNode?
          Hide
          Bryan Pendleton added a comment -

          I think Rick may have been asking whether QueryTreeNode.acceptChildren should
          be abstract. For which sub-classes of QueryTreeNode is the empty implementation
          of acceptChildren correct as is?

          Show
          Bryan Pendleton added a comment - I think Rick may have been asking whether QueryTreeNode.acceptChildren should be abstract. For which sub-classes of QueryTreeNode is the empty implementation of acceptChildren correct as is?
          Hide
          Knut Anders Hatlen added a comment -

          I didn't check all the nodes (QueryTreeNode has 159 sub-classes), but here are some examples on classes that should have an empty implementation of acceptChildren() because they have no visitable children:

          UserTypeConstantNode
          SQLBooleanConstantNode
          UntypedNullConstantNode
          BitConstantNode
          XMLConstantNode
          BooleanConstantNode
          VarbitConstantNode
          NumericConstantNode
          SpecialFunctionNode
          TableName
          TableElementNode
          WindowReferenceNode
          CurrentDatetimeOperatorNode
          CurrentRowLocationNode
          NOPStatementNode
          SetTransactionIsolationNode

          The following classes should also have an empty acceptChildren() method if they currently implement accept() correctly (they do have visitable children, though, so it might be that they actually should have had a non-empty acceptChildren() method):

          GenerationClassNode
          DefaultNode
          PrivilegeNode
          TablePrivilegeNode
          WindowDefinitionNode
          BaseColumnNode
          VirtualColumnNode
          ParameterNode
          ColumnReference
          ExecSPSNode

          I see the point that making acceptChildren() abstract could help us detect some cases of missing overrides. Apart from slightly reducing the amount of code needed, one advantage of having an empty method in QTN instead of an abstract method is that all the overrides will have the same structure (call super.acceptChildren() and then accept() on all added children). With the abstract method, the structure will be different depending on where in the class tree the node is located (super.acceptChildren() cannot be called if none of the super-classes implement it, but it has to be called if one of them does). This means that each time you implement an acceptChildren() method, you'll have to check all sub-classes of the node in which you implement it, and add a call to super.acceptChildren() in each of the sub-classes' implementations. This could be easy to forget.

          The two approaches protect against different classes of errors, so I can be convinced either way. For now, I'll just commit the patch as it is, since most of it will stay the same regardless of which approach we choose.

          Show
          Knut Anders Hatlen added a comment - I didn't check all the nodes (QueryTreeNode has 159 sub-classes), but here are some examples on classes that should have an empty implementation of acceptChildren() because they have no visitable children: UserTypeConstantNode SQLBooleanConstantNode UntypedNullConstantNode BitConstantNode XMLConstantNode BooleanConstantNode VarbitConstantNode NumericConstantNode SpecialFunctionNode TableName TableElementNode WindowReferenceNode CurrentDatetimeOperatorNode CurrentRowLocationNode NOPStatementNode SetTransactionIsolationNode The following classes should also have an empty acceptChildren() method if they currently implement accept() correctly (they do have visitable children, though, so it might be that they actually should have had a non-empty acceptChildren() method): GenerationClassNode DefaultNode PrivilegeNode TablePrivilegeNode WindowDefinitionNode BaseColumnNode VirtualColumnNode ParameterNode ColumnReference ExecSPSNode I see the point that making acceptChildren() abstract could help us detect some cases of missing overrides. Apart from slightly reducing the amount of code needed, one advantage of having an empty method in QTN instead of an abstract method is that all the overrides will have the same structure (call super.acceptChildren() and then accept() on all added children). With the abstract method, the structure will be different depending on where in the class tree the node is located (super.acceptChildren() cannot be called if none of the super-classes implement it, but it has to be called if one of them does). This means that each time you implement an acceptChildren() method, you'll have to check all sub-classes of the node in which you implement it, and add a call to super.acceptChildren() in each of the sub-classes' implementations. This could be easy to forget. The two approaches protect against different classes of errors, so I can be convinced either way. For now, I'll just commit the patch as it is, since most of it will stay the same regardless of which approach we choose.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d4421-1a.diff with revision 830154.

          Show
          Knut Anders Hatlen added a comment - Committed d4421-1a.diff with revision 830154.
          Hide
          Bryan Pendleton added a comment -

          Thanks Knut for taking the time to work through the details. I think this is a good
          change and I'm excited about the improved optimization behaviors!

          Show
          Bryan Pendleton added a comment - Thanks Knut for taking the time to work through the details. I think this is a good change and I'm excited about the improved optimization behaviors!
          Hide
          Dag H. Wanvik added a comment -

          I think the regular structure provided by the empty method, and the saved code in nodes with no kids, is the stronger argument here. I noticed similar advantages when working with treePrint/printSubNodes. The usage pattern should be clearly documented in QTN, though, since it is slight less intuitive.

          Show
          Dag H. Wanvik added a comment - I think the regular structure provided by the empty method, and the saved code in nodes with no kids, is the stronger argument here. I noticed similar advantages when working with treePrint/printSubNodes. The usage pattern should be clearly documented in QTN, though, since it is slight less intuitive.
          Hide
          Knut Anders Hatlen added a comment -

          Dag, when you say that the pattern should be clearly documented in QTN, did you have something else in mind than the current javadoc for QTN.acceptChildren()? It currently says that all overrides should call super.acceptChildren(), but I could make the comment more verbose if you think that's appropriate.

          Show
          Knut Anders Hatlen added a comment - Dag, when you say that the pattern should be clearly documented in QTN, did you have something else in mind than the current javadoc for QTN.acceptChildren()? It currently says that all overrides should call super.acceptChildren(), but I could make the comment more verbose if you think that's appropriate.
          Hide
          Dag H. Wanvik added a comment -

          I checked the Javadoc on QTN.acceptChildren, I think it is fine.

          Show
          Dag H. Wanvik added a comment - I checked the Javadoc on QTN.acceptChildren, I think it is fine.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching patch 2a which removes the now redundant calls to stopTraversal() in acceptChildren(). The calls are redundant because QTN.accept() always checks stopTraversal() before calling visit(). I found a couple of instances where subclasses had forgotten to check stopTraversal(), so I believe the centralized check in QTN.accept() is more robust than relying on each subclass to check it.

          All the regression tests ran cleanly.

          Show
          Knut Anders Hatlen added a comment - Attaching patch 2a which removes the now redundant calls to stopTraversal() in acceptChildren(). The calls are redundant because QTN.accept() always checks stopTraversal() before calling visit(). I found a couple of instances where subclasses had forgotten to check stopTraversal(), so I believe the centralized check in QTN.accept() is more robust than relying on each subclass to check it. All the regression tests ran cleanly.
          Knut Anders Hatlen made changes -
          Attachment d4421-2a.diff [ 12423566 ]
          Attachment d4421-2a.stat [ 12423567 ]
          Knut Anders Hatlen made changes -
          Issue & fix info [Patch Available]
          Knut Anders Hatlen made changes -
          Link This issue blocks DERBY-4416 [ DERBY-4416 ]
          Hide
          Dag H. Wanvik added a comment -

          +1 to centralizing the stopTraversal check.

          Show
          Dag H. Wanvik added a comment - +1 to centralizing the stopTraversal check.
          Hide
          Knut Anders Hatlen added a comment -

          Committed patch 2a with revision 831244.
          This concludes the planned work on this issue, so I'm marking it as resolved.

          Show
          Knut Anders Hatlen added a comment - Committed patch 2a with revision 831244. This concludes the planned work on this issue, so I'm marking it as resolved.
          Knut Anders Hatlen made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Issue & fix info [Patch Available]
          Fix Version/s 10.6.0.0 [ 12313727 ]
          Resolution Fixed [ 1 ]
          Knut Anders Hatlen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Workflow jira [ 12480265 ] Default workflow, editable Closed status [ 12799609 ]

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development