Derby
  1. Derby
  2. DERBY-4087

Clean up debug printing of the abstract syntax trees after parsing, binding and optimization

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.6.1.0
    • Component/s: Miscellaneous, SQL
    • Labels:
      None

      Description

      Currently, the printing is often inconsistent:

      • some subtrees not printed
      • wrong indentation due to missing newlines, or lacking level increments
      • redundant printing of subtrees (AST is really a DAG, would be nice to print only once and then refer back to show aliasing)
      • some items printed twice due to inconsistent usage of pattern
      1. derby-4087-statistics.stat
        2 kB
        Dag H. Wanvik
      2. derby-4087-statistics.diff
        802 kB
        Dag H. Wanvik
      3. derby-4087-b.stat
        4 kB
        Dag H. Wanvik
      4. derby-4087-b.diff
        53 kB
        Dag H. Wanvik
      5. dumpquery
        2 kB
        Dag H. Wanvik
      6. sample.log
        26 kB
        Dag H. Wanvik
      7. derby-4087-a.stat
        3 kB
        Dag H. Wanvik
      8. derby-4087-a.diff
        41 kB
        Dag H. Wanvik

        Activity

        Hide
        Dag H. Wanvik added a comment -

        Closing this issue now, since I have put in all the changes I intended. I am sure there are more formatting improvements that could be made, though. If/when we feel the need, we could re-open this issue, or a file new one.

        Show
        Dag H. Wanvik added a comment - Closing this issue now, since I have put in all the changes I intended. I am sure there are more formatting improvements that could be made, though. If/when we feel the need, we could re-open this issue, or a file new one.
        Hide
        Dag H. Wanvik added a comment -

        Committed follow-up patch derby-4087-statistics as svn 809632 on trunk.

        Show
        Dag H. Wanvik added a comment - Committed follow-up patch derby-4087-statistics as svn 809632 on trunk.
        Hide
        Dag H. Wanvik added a comment - - edited

        Uploading aother patch for this issue, which re-introduces the changes that had to be rolled back.
        derby-4087-statistics fixes the formatting errors seen by Knut and Bryan. Since I could not
        use java.util.regexp, I hand-coded three patterns in StringUtil.java this time around.
        The reason for all this text massaging is that the qualifiers are pre-formatted before the output indentation level is knows, which lead to the weird formatting seen. The new method
        StringUtil.ensureIndent fixes that problem.

        I had to modify many old harness test masters, but now they all look nicer at least

        Regressions passed ok.

        Show
        Dag H. Wanvik added a comment - - edited Uploading aother patch for this issue, which re-introduces the changes that had to be rolled back. derby-4087-statistics fixes the formatting errors seen by Knut and Bryan. Since I could not use java.util.regexp, I hand-coded three patterns in StringUtil.java this time around. The reason for all this text massaging is that the qualifiers are pre-formatted before the output indentation level is knows, which lead to the weird formatting seen. The new method StringUtil.ensureIndent fixes that problem. I had to modify many old harness test masters, but now they all look nicer at least Regressions passed ok.
        Hide
        Dag H. Wanvik added a comment -

        Reverted the use of java.util.regex in StringUtil.java, as svn 808945.

        Show
        Dag H. Wanvik added a comment - Reverted the use of java.util.regex in StringUtil.java, as svn 808945.
        Hide
        Knut Anders Hatlen added a comment -

        This broke the Tinderbox build, as java.util.regex cannot be used in engine code (the package is not part of Foundation Profile 1.1).
        http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/UpdateInfo/808657-buildDetails.txt

        Show
        Knut Anders Hatlen added a comment - This broke the Tinderbox build, as java.util.regex cannot be used in engine code (the package is not part of Foundation Profile 1.1). http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/UpdateInfo/808657-buildDetails.txt
        Hide
        Dag H. Wanvik added a comment -

        Temporarily rolled back changes to the result set statistics printing due to broken masters in old harness lang suite:

        java/engine/org/apache/derby/impl/sql/execute/rts/RealDistinctScanStatistics.java
        java/engine/org/apache/derby/impl/sql/execute/rts/RealHashScanStatistics.java
        java/engine/org/apache/derby/impl/sql/execute/rts/RealTableScanStatistics.java

        svn revision 808601. Will make a new patch with modified masters to match the printing improvements.

        Show
        Dag H. Wanvik added a comment - Temporarily rolled back changes to the result set statistics printing due to broken masters in old harness lang suite: java/engine/org/apache/derby/impl/sql/execute/rts/RealDistinctScanStatistics.java java/engine/org/apache/derby/impl/sql/execute/rts/RealHashScanStatistics.java java/engine/org/apache/derby/impl/sql/execute/rts/RealTableScanStatistics.java svn revision 808601. Will make a new patch with modified masters to match the printing improvements.
        Hide
        Dag H. Wanvik added a comment -

        Committed patch "b" as svn 808523. I'll keep it open for a while in case we need adjustments.

        Show
        Dag H. Wanvik added a comment - Committed patch "b" as svn 808523. I'll keep it open for a while in case we need adjustments.
        Hide
        Dag H. Wanvik added a comment -

        Thanks for looking at this guys.

        Uploading a version "b" which:

        • fixes the wrong formatted Knut and Brian saw in the run-time
          statistics. This touches:

        RealDistinctScanStatistics.java,
        RealHashScanStatistics.java,
        RealTableScanStatistics.java
        StringUtil

        • adds the comments Knut requested. And no, no special reason to use AbstractMap, replaced it with Map.
        • adds some more formatting to OrderByList, OrderByColumn and
          IndexColumnOrder.
        • I forgot to mention that this patch also adds a "brief structure mode
          dump" mode, which only prints the tree structure, e.g. thus:

        org.apache.derby.impl.sql.compile.VirtualColumnNode@4fdf11
        sourceColumn:
        org.apache.derby.impl.sql.compile.ResultColumn@1be0369
        expression:
        org.apache.derby.impl.sql.compile.VirtualColumnNode@1d1fc02
        sourceColumn:
        org.apache.derby.impl.sql.compile.ResultColumn@f800db
        expression:
        org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@5364
        operand:
        org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@162f030
        operand:
        org.apache.derby.impl.sql.compile.ColumnReference@870e2f
        source:
        org.apache.derby.impl.sql.compile.ResultColumn@b44eb4
        expression:
        org.apache.derby.impl.sql.compile.VirtualColumnNode@19b2faf
        sourceColumn:
        org.apache.derby.impl.sql.compile.ResultColumn@157bf4c
        expression:
        :

        I am not sure this is has much value. The brief mode is enabled by
        -Dderby.debug.true=DumpBrief. I'll take it out if people think this is
        unnecessary cruft. I find it useful to get an overview sometimes. It
        does show shared (aliased) nodes as in the normal mode.

        Show
        Dag H. Wanvik added a comment - Thanks for looking at this guys. Uploading a version "b" which: fixes the wrong formatted Knut and Brian saw in the run-time statistics. This touches: RealDistinctScanStatistics.java, RealHashScanStatistics.java, RealTableScanStatistics.java StringUtil adds the comments Knut requested. And no, no special reason to use AbstractMap, replaced it with Map. adds some more formatting to OrderByList, OrderByColumn and IndexColumnOrder. I forgot to mention that this patch also adds a "brief structure mode dump" mode, which only prints the tree structure, e.g. thus: org.apache.derby.impl.sql.compile.VirtualColumnNode@4fdf11 sourceColumn: org.apache.derby.impl.sql.compile.ResultColumn@1be0369 expression: org.apache.derby.impl.sql.compile.VirtualColumnNode@1d1fc02 sourceColumn: org.apache.derby.impl.sql.compile.ResultColumn@f800db expression: org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@5364 operand: org.apache.derby.impl.sql.compile.UnaryArithmeticOperatorNode@162f030 operand: org.apache.derby.impl.sql.compile.ColumnReference@870e2f source: org.apache.derby.impl.sql.compile.ResultColumn@b44eb4 expression: org.apache.derby.impl.sql.compile.VirtualColumnNode@19b2faf sourceColumn: org.apache.derby.impl.sql.compile.ResultColumn@157bf4c expression: : I am not sure this is has much value. The brief mode is enabled by -Dderby.debug.true=DumpBrief. I'll take it out if people think this is unnecessary cruft. I find it useful to get an overview sometimes. It does show shared (aliased) nodes as in the normal mode.
        Hide
        Bryan Pendleton added a comment -

        The strange newline handling in the formatting of the stop and start position values is definitely
        present in other places, such as the RUNTIMESTATISTICS displays. I believe it's part of some
        low-level shared code which constructs string versions of the underlying qualifier objects.

        I think it would be nice to fix, but it's definitely present prior to this patch.

        Show
        Bryan Pendleton added a comment - The strange newline handling in the formatting of the stop and start position values is definitely present in other places, such as the RUNTIMESTATISTICS displays. I believe it's part of some low-level shared code which constructs string versions of the underlying qualifier objects. I think it would be nice to fix, but it's definitely present prior to this patch.
        Hide
        Knut Anders Hatlen added a comment -

        The sample log file looks a lot more readable than what I'm used to, so a +1 from me based on visual inspection of the output!

        I looked at the patch briefly, and nothing caught my attention. But perhaps a short comment to explain printedObjectsMap and getPrintedObjectsMap() would be in order. (And was there a reason for using AbstractMap in the signature instead of Map?)

        Looking at the sample log, I see that the output for start/stop position and qualifiers is a bit odd:

        start position:
        null
        stop position:
        null
        qualifiers:
        None

        I don't think this was introduced by your patch, though, as I think I have seen similar oddities before.

        Show
        Knut Anders Hatlen added a comment - The sample log file looks a lot more readable than what I'm used to, so a +1 from me based on visual inspection of the output! I looked at the patch briefly, and nothing caught my attention. But perhaps a short comment to explain printedObjectsMap and getPrintedObjectsMap() would be in order. (And was there a reason for using AbstractMap in the signature instead of Map?) Looking at the sample log, I see that the output for start/stop position and qualifiers is a bit odd: start position: null stop position: null qualifiers: None I don't think this was introduced by your patch, though, as I think I have seen similar oddities before.
        Hide
        Rick Hillegas added a comment -

        Thanks for the patch, Dag. I have tested it with the XmlTreeWalker attached to DERBY-791. The two patches seem to be orthogonal so that's good!

        Show
        Rick Hillegas added a comment - Thanks for the patch, Dag. I have tested it with the XmlTreeWalker attached to DERBY-791 . The two patches seem to be orthogonal so that's good!
        Hide
        Dag H. Wanvik added a comment -

        Attaching a bash script and a sample output which I find useful sometimes for looking at the query trees. It finds the dumped query tree(s) from a given query on derby.log and prints it on std out., and also adds helper vertical lines to help match up tree levels. If an existing database is not given, it will create a sample one on the fly with one table.

        Example:
        dumpquery -pboie 'select * from t order by j' > sample.log

        Show
        Dag H. Wanvik added a comment - Attaching a bash script and a sample output which I find useful sometimes for looking at the query trees. It finds the dumped query tree(s) from a given query on derby.log and prints it on std out., and also adds helper vertical lines to help match up tree levels. If an existing database is not given, it will create a sample one on the fly with one table. Example: dumpquery -pboie 'select * from t order by j' > sample.log
        Hide
        Dag H. Wanvik added a comment -

        Uploading a patch that addresses these issues, derby-4087-a. I have tried to ensure correctness by
        making lots of manual queries and inspecting the dumped trees for readability.
        To me the trees look more consistent and readable with the patch, but there are many changes so its quite possible
        I made some errors in the process. If people think it's OK, further tweaks can be done later now that we have a
        more consistent use of the patterns.
        Since this all concerns code used used for debugging, there are presently no tests.
        The regressions ran OK at least (hardly surprising)

        Show
        Dag H. Wanvik added a comment - Uploading a patch that addresses these issues, derby-4087-a. I have tried to ensure correctness by making lots of manual queries and inspecting the dumped trees for readability. To me the trees look more consistent and readable with the patch, but there are many changes so its quite possible I made some errors in the process. If people think it's OK, further tweaks can be done later now that we have a more consistent use of the patterns. Since this all concerns code used used for debugging, there are presently no tests. The regressions ran OK at least (hardly surprising)
        Hide
        Dag H. Wanvik added a comment - - edited

        I have found some further issues regarding toString, treePrint and
        printSubNodes. The comments only apply to subclasses of
        QueryTreeNode.

        • Some class variables' value are printed with a leading "<label>: "
          in front of the value, others do not have a label, the value is
          printed alone. Usually, the label is the same as the class
          variable's name. Shouldn't this always be the case for uniformity?
        • Some class variable labels are printed even if the values is not,
          in cases where the variables value is null, e.g. "preJoinFL: " in
          SelectNode. I suggest not printing in such cases to save space.
        • Some toString methods call super.toString before formatting the
          class's own member variables, others do it the other way around. Is
          there a rationale for this I wonder? Or is it just accidental? It
          seems the established pattern is that super be called after, except
          for DDL operations, which always call super first, so I suggest
          adjusting toString to those two patterns throughout.

        Non-DDL classes that call super.toString first:
        CoalesceFunctionNode
        CurrentDatetimeOperatorNode
        SpecialFunctionNode
        ExtractOperatorNode

        • Some usages of toString terminate with a newline, most do not. I
          think they should either end with a newline (DDL) or newline + call
          to super.toString (DML). At the root (qua anchor),
          QueryTreeNode.toString just prints "", so effectively for DML, the
          last thing printed will be the newline also.
        • I think all nodes should call super.toString. This could be because
          the author "knew" they are right under QueryTreeNode, and so has
          chosen to optimize the call away (QueryTreeNode.toString prints the
          empty string), or it could be an omission. I think it would be good
          to be consistent here, too.

        Example of nodes that do not call super.toString at all:
        GenerationClauseNode
        OrderByColumn
        TableName

        TableName is possibly an allowable exception, I think, because its
        toString will return a string even in insane builds, so I not sure
        it can tolerate a terminating newline, so I intend to not touch
        this.

        • Some nodes call toString of underlying nodes in their own toString,
          but such subnodes should, according to the contract defined in
          QueryTreeNode, be handled by PrintSubNodes/TreePrint:

        DistinctNode (formats childResult)
        GroupByNode (formats childResult)
        OrderByNode (formats childResult)
        OrderByColumn (formats expression)

        I suggest bringing such instances into line.

        • Vectors are handled specially. Vectors with tree elements are
          subclasses of QueryTreeNodeVector, which contains a generic toString
          method which calls toString of the elements of the vector and
          concatenates them. I suggest changing this to be printSubNodes and
          use series of calls to treePrint instead to comply with the general
          pattern. Subclasses which have no additional members need not
          override toString either.
        • Some subclasses of QueryTreeNodeVector needlessly define their own
          printSubNodes method. I propose to remove those:

        ValueNodeList
        SubqueryList
        PredicateList
        FromList

        this now inherit QueryTreeNodeVector's printSubNodes.

        • One subclasses of QueryTreeNodeVector (TableElementList) rolls its
          own toString to print the elements. I suggest to remove this an
          dinstead rely on the general mechanism outlined above.

        Some classes neglect to print own member variables:
        ResultColumnList

        • Some subclasses of QueryTreeNodeVector define their own treePrint
          method, e.g.

        ResultColumnList

        This seems to be a remnant from a time when ResultColumnList was not
        a subclass of QueryTreeNode, c.f this comment in

        /**

        • This class needs a treePrint method, even though it is not a
        • descendant of QueryTreeNode, because its members contain tree..

        so I intend to remove this.

        Show
        Dag H. Wanvik added a comment - - edited I have found some further issues regarding toString, treePrint and printSubNodes. The comments only apply to subclasses of QueryTreeNode. Some class variables' value are printed with a leading "<label>: " in front of the value, others do not have a label, the value is printed alone. Usually, the label is the same as the class variable's name. Shouldn't this always be the case for uniformity? Some class variable labels are printed even if the values is not, in cases where the variables value is null, e.g. "preJoinFL: " in SelectNode. I suggest not printing in such cases to save space. Some toString methods call super.toString before formatting the class's own member variables, others do it the other way around. Is there a rationale for this I wonder? Or is it just accidental? It seems the established pattern is that super be called after , except for DDL operations, which always call super first , so I suggest adjusting toString to those two patterns throughout. Non-DDL classes that call super.toString first: CoalesceFunctionNode CurrentDatetimeOperatorNode SpecialFunctionNode ExtractOperatorNode Some usages of toString terminate with a newline, most do not. I think they should either end with a newline (DDL) or newline + call to super.toString (DML). At the root (qua anchor), QueryTreeNode.toString just prints "", so effectively for DML, the last thing printed will be the newline also. I think all nodes should call super.toString. This could be because the author "knew" they are right under QueryTreeNode, and so has chosen to optimize the call away (QueryTreeNode.toString prints the empty string), or it could be an omission. I think it would be good to be consistent here, too. Example of nodes that do not call super.toString at all: GenerationClauseNode OrderByColumn TableName TableName is possibly an allowable exception, I think, because its toString will return a string even in insane builds, so I not sure it can tolerate a terminating newline, so I intend to not touch this. Some nodes call toString of underlying nodes in their own toString, but such subnodes should, according to the contract defined in QueryTreeNode, be handled by PrintSubNodes/TreePrint: DistinctNode (formats childResult) GroupByNode (formats childResult) OrderByNode (formats childResult) OrderByColumn (formats expression) I suggest bringing such instances into line. Vectors are handled specially. Vectors with tree elements are subclasses of QueryTreeNodeVector, which contains a generic toString method which calls toString of the elements of the vector and concatenates them. I suggest changing this to be printSubNodes and use series of calls to treePrint instead to comply with the general pattern. Subclasses which have no additional members need not override toString either. Some subclasses of QueryTreeNodeVector needlessly define their own printSubNodes method. I propose to remove those: ValueNodeList SubqueryList PredicateList FromList this now inherit QueryTreeNodeVector's printSubNodes. One subclasses of QueryTreeNodeVector (TableElementList) rolls its own toString to print the elements. I suggest to remove this an dinstead rely on the general mechanism outlined above. Some classes neglect to print own member variables: ResultColumnList Some subclasses of QueryTreeNodeVector define their own treePrint method, e.g. ResultColumnList This seems to be a remnant from a time when ResultColumnList was not a subclass of QueryTreeNode, c.f this comment in /** This class needs a treePrint method, even though it is not a descendant of QueryTreeNode, because its members contain tree.. so I intend to remove this.
        Hide
        Knut Anders Hatlen added a comment -

        I think truncating already seen subtrees is a good idea as it (a) reduces the size of the printed tree and (b) makes it easier to see that two trees are actually the same one. Doing it always is fine with me. As you mentioned, the current output is often inconsistent, and I find it unlikely that anyone rely on the exact output, so I don't see a need for a knob. It is debug output, after all. If someone does need a stable, parsable output, DERBY-791 sounds like a better fit.

        Show
        Knut Anders Hatlen added a comment - I think truncating already seen subtrees is a good idea as it (a) reduces the size of the printed tree and (b) makes it easier to see that two trees are actually the same one. Doing it always is fine with me. As you mentioned, the current output is often inconsistent, and I find it unlikely that anyone rely on the exact output, so I don't see a need for a knob. It is debug output, after all. If someone does need a stable, parsable output, DERBY-791 sounds like a better fit.
        Hide
        Dag H. Wanvik added a comment -

        I could imagine printing already seen subtrees something like this:

        resultColumns:
        org.apache.derby.impl.sql.compile.ResultColumnList@1e1be92

        org.apache.derby.impl.sql.compile.ResultColumn@1038de7
        **truncated**

        org.apache.derby.impl.sql.compile.ResultColumn@1490eb5
        **truncated**

        Is this helpful? Searching backward in the printout would then reveal these subtrees fully printed.
        Do we want a new knob to enable this, or could it be done always?

        Show
        Dag H. Wanvik added a comment - I could imagine printing already seen subtrees something like this: resultColumns: org.apache.derby.impl.sql.compile.ResultColumnList@1e1be92 org.apache.derby.impl.sql.compile.ResultColumn@1038de7 ** truncated ** org.apache.derby.impl.sql.compile.ResultColumn@1490eb5 ** truncated ** Is this helpful? Searching backward in the printout would then reveal these subtrees fully printed. Do we want a new knob to enable this, or could it be done always?

          People

          • Assignee:
            Dag H. Wanvik
            Reporter:
            Dag H. Wanvik
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development