Derby
  1. Derby
  2. DERBY-6211

Make Optimizer trace logic pluggable.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.11.1.1
    • Fix Version/s: None
    • Component/s: SQL
    • Urgency:
      Normal

      Description

      Right now the trace logic in the optimizer is hard-coded to produce a stream of diagnostics. It would be good to be able to plug alternative trace logic into the optimizer. This would make the following possible:

      1) Plug in trace logic which produces formats which are easier to study and which can be analyzed mechanically. E.g., xml formatted output.

      2) Plug in trace logic which can be used during unit testing to verify that the optimizer has picked the right plan. Over time this might make it easier to migrate canon-based tests to assertion-based tests.

      1. derby-6211-01-aa-createPlugin.diff
        74 kB
        Rick Hillegas
      2. derby-6211-02-aa-cleanup.diff
        28 kB
        Rick Hillegas
      3. derby-6211-02-ab-cleanup.diff
        28 kB
        Rick Hillegas
      4. derby-6211-03-aa-customTracer.diff
        17 kB
        Rick Hillegas
      5. derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff
        15 kB
        Rick Hillegas
      6. derby-6211-05-aa-xmlOptimizerTracer.diff
        46 kB
        Rick Hillegas
      7. derby-6211-06-ab-packageProtect-XMLOptTrace.diff
        0.6 kB
        Rick Hillegas
      8. derby-6211-07-aa-useSchemaQualifiedNamesInSummaries.diff
        10 kB
        Rick Hillegas
      9. derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff
        10 kB
        Rick Hillegas
      10. derby-6211-08-aa-fixNPE.diff
        2 kB
        Rick Hillegas
      11. derby-6211-09-aa-addTests.diff
        35 kB
        Rick Hillegas
      12. derby-6211-10-aa-makingCostEstimateObject.diff
        11 kB
        Rick Hillegas
      13. derby-6211-11-aa-moveTracerOutOfOptimizer.diff
        18 kB
        Rick Hillegas
      14. derby-6211-11-ab-moveTracerOutOfOptimizer.diff
        18 kB
        Rick Hillegas
      15. derby-6211-12-aa-traceEndOfQueryBlock.diff
        23 kB
        Rick Hillegas
      16. derby-6211-13-aa-SelectNode_optimizer.diff
        7 kB
        Rick Hillegas

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-01-aa-createPlugin.diff. This patch abstracts out the optimizer trace logic so that other tracers can be plugged in. I am running tests now.

          This patch makes the following changes:

          1) Adds a new interface OptTrace, which describes the trace operations performed by the Optimizer.

          2) Removes the generic trace dispatcher from Level2OptimizerImpl. The giant switch statement is removed and the individual cases are moved into separate methods in DefaultOptTrace, the default implementation of OptTrace. DefaultOptTrace produces the familiar, current optimizer trace output. The generic, opaque, weakly typed trace() method is removed and replaced by the strongly typed methods in OptTrace.

          3) Removes the trace flag constants from Optimizer.

          4) Makes trace callers use the new strongly typed methods in OptTrace.

          I have run some hand tests. With and without this patch, the output of optimizer tracing looks the same.

          Follow-on patches can introduce alternative trace plugins.

          Touches the following files:

          ----------------------

          A java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java

          Changes for (1).

          ----------------------

          A java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java

          Changes for (2).

          ----------------------

          M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java

          Changes for (3).

          ----------------------

          M java/engine/org/apache/derby/impl/sql/compile/FromTable.java
          M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/JoinNode.java
          M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java

          Changes for (4).

          Show
          Rick Hillegas added a comment - Attaching derby-6211-01-aa-createPlugin.diff. This patch abstracts out the optimizer trace logic so that other tracers can be plugged in. I am running tests now. This patch makes the following changes: 1) Adds a new interface OptTrace, which describes the trace operations performed by the Optimizer. 2) Removes the generic trace dispatcher from Level2OptimizerImpl. The giant switch statement is removed and the individual cases are moved into separate methods in DefaultOptTrace, the default implementation of OptTrace. DefaultOptTrace produces the familiar, current optimizer trace output. The generic, opaque, weakly typed trace() method is removed and replaced by the strongly typed methods in OptTrace. 3) Removes the trace flag constants from Optimizer. 4) Makes trace callers use the new strongly typed methods in OptTrace. I have run some hand tests. With and without this patch, the output of optimizer tracing looks the same. Follow-on patches can introduce alternative trace plugins. Touches the following files: ---------------------- A java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java Changes for (1). ---------------------- A java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java Changes for (2). ---------------------- M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java Changes for (3). ---------------------- M java/engine/org/apache/derby/impl/sql/compile/FromTable.java M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/JoinNode.java M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java Changes for (4).
          Hide
          Mamta A. Satoor added a comment -

          Hi Rick, thanks for working on this useful addition to Derby. I was wondering what were you planning on doing as part of following
          **************************
          2) Plug in trace logic which can be used during unit testing to verify that the optimizer has picked the right plan. Over time this might make it easier to migrate canon-based tests to assertion-based tests.
          **************************
          Recently, when I wrote couple optimizer related junit test, I was able to use existing utilities to determine if the optimizer was choosing the correct scan - InListMultiProbeTest:runThreeQueries
          s.execute("call SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)");
          ...
          rtsp = SQLUtilities.getRuntimeStatisticsParser(s);
          assertTrue(rtsp.usedIndexScan());
          Are you planning on replacing this mechanism with something else? Thanks

          Show
          Mamta A. Satoor added a comment - Hi Rick, thanks for working on this useful addition to Derby. I was wondering what were you planning on doing as part of following ************************** 2) Plug in trace logic which can be used during unit testing to verify that the optimizer has picked the right plan. Over time this might make it easier to migrate canon-based tests to assertion-based tests. ************************** Recently, when I wrote couple optimizer related junit test, I was able to use existing utilities to determine if the optimizer was choosing the correct scan - InListMultiProbeTest:runThreeQueries s.execute("call SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)"); ... rtsp = SQLUtilities.getRuntimeStatisticsParser(s); assertTrue(rtsp.usedIndexScan()); Are you planning on replacing this mechanism with something else? Thanks
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-01-aa-createPlugin.diff. Committed at subversion revision 1478347.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-01-aa-createPlugin.diff. Committed at subversion revision 1478347.
          Hide
          Rick Hillegas added a comment -

          Hi Mamta,

          I'm not planning to fix anything that isn't broken. I think that the RuntimeStatisticsParser does a good job of analyzing the plans chosen for many queries. But I think that there are limitations in the approach taken by RuntimeStatisticsParser: looking for the presence of specific strings in the runtime statistics output. For instance, the following RuntimeStatisticsParser method

          public boolean usedTableScan(String tableName)

          ...will give you useful information if a table is only scanned once by a query. But if a table is scanned more than once, then the method won't tell you which scan was the table scan. I'm hoping that a trace plugin will be able to give us the detailed structure of the plan. Thanks.

          Show
          Rick Hillegas added a comment - Hi Mamta, I'm not planning to fix anything that isn't broken. I think that the RuntimeStatisticsParser does a good job of analyzing the plans chosen for many queries. But I think that there are limitations in the approach taken by RuntimeStatisticsParser: looking for the presence of specific strings in the runtime statistics output. For instance, the following RuntimeStatisticsParser method public boolean usedTableScan(String tableName) ...will give you useful information if a table is only scanned once by a query. But if a table is scanned more than once, then the method won't tell you which scan was the table scan. I'm hoping that a trace plugin will be able to give us the detailed structure of the plan. Thanks.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-02-aa-cleanup.diff. This patch removes some unused code related to optimizer tracing. This patch also tries to clean up the confusing and brittle division of tracing responsibility currently split between the optimizer itself and the LanguageConnectionContext. I am running tests now.

          The following changes are made:

          1) Removes the unused and unusable support for html formatting of optimizer traces. Once I have implemented an xml-formatted tracer, it should be easy to follow its pattern in order to write an html-formatted tracer--in case anyone really wants html.

          2) Centralizes tracing responsibility in the LanguageConnectionContext. The LCC now has only 3 methods related to optimizer tracing and the optimizer itself retrieves the tracer from the LCC:

          setOptimizerTracer()
          getOptimizerTracer()
          optimizerTracingIsOn()

          Touches the following files:

          ----------------------

          M java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java
          M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
          M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java
          M java/engine/org/apache/derby/iapi/db/OptimizerTrace.java
          M java/engine/org/apache/derby/impl/sql/GenericStatement.java
          M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
          M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java

          Show
          Rick Hillegas added a comment - Attaching derby-6211-02-aa-cleanup.diff. This patch removes some unused code related to optimizer tracing. This patch also tries to clean up the confusing and brittle division of tracing responsibility currently split between the optimizer itself and the LanguageConnectionContext. I am running tests now. The following changes are made: 1) Removes the unused and unusable support for html formatting of optimizer traces. Once I have implemented an xml-formatted tracer, it should be easy to follow its pattern in order to write an html-formatted tracer--in case anyone really wants html. 2) Centralizes tracing responsibility in the LanguageConnectionContext. The LCC now has only 3 methods related to optimizer tracing and the optimizer itself retrieves the tracer from the LCC: setOptimizerTracer() getOptimizerTracer() optimizerTracingIsOn() Touches the following files: ---------------------- M java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java M java/engine/org/apache/derby/iapi/db/OptimizerTrace.java M java/engine/org/apache/derby/impl/sql/GenericStatement.java M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-02-ab-cleanup.diff. This second rev of the patch fixes a bug introduced by derby-6211-01-aa-createPlugin.diff but which was obscured by some self-healing code in that patch. I am re-running tests now.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-02-ab-cleanup.diff. This second rev of the patch fixes a bug introduced by derby-6211-01-aa-createPlugin.diff but which was obscured by some self-healing code in that patch. I am re-running tests now.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-02-ab-cleanup.diff. Committed at subversion derby-6211-02-ab-cleanup.diff.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-02-ab-cleanup.diff. Committed at subversion derby-6211-02-ab-cleanup.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-03-aa-customTracer.diff. This patch adds the ability to plug custom trace logic into the Optimizer. I am running tests now.

          To plug custom trace logic into the Optimizer, do the following:

          call syscs_util.syscs_register_tool
          (
          'optimizerTracing', true, 'custom',
          'my.custom.Classname'
          );

          ...where my.custom.Classname is the name of a class which

          1) implements org.apache.derby.iapi.sql.compile.OptTrace

          2) has a 0-arg constructor

          Then run a couple queries. When you are finished tracing, unload your custom trace logic. If your class implements OptTrace.printToWriter(), then your custom trace output will go to System.out or to a file, depending on the arguments to the unload command. This prints the output to System.out:

          call syscs_util.syscs_register_tool( 'optimizerTracing', false );

          ...and this prints the trace output to z1.txt:

          call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z1.txt' );

          Touches the following files:

          --------------

          M java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java
          M java/tools/org/apache/derby/loc/toolsmessages.properties

          Support for custom optimizer tracing.

          --------------

          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/OptionalToolsTest.java

          Tests for this new functionality.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-03-aa-customTracer.diff. This patch adds the ability to plug custom trace logic into the Optimizer. I am running tests now. To plug custom trace logic into the Optimizer, do the following: call syscs_util.syscs_register_tool ( 'optimizerTracing', true, 'custom', 'my.custom.Classname' ); ...where my.custom.Classname is the name of a class which 1) implements org.apache.derby.iapi.sql.compile.OptTrace 2) has a 0-arg constructor Then run a couple queries. When you are finished tracing, unload your custom trace logic. If your class implements OptTrace.printToWriter(), then your custom trace output will go to System.out or to a file, depending on the arguments to the unload command. This prints the output to System.out: call syscs_util.syscs_register_tool( 'optimizerTracing', false ); ...and this prints the trace output to z1.txt: call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z1.txt' ); Touches the following files: -------------- M java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java M java/tools/org/apache/derby/loc/toolsmessages.properties Support for custom optimizer tracing. -------------- A java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/OptionalToolsTest.java Tests for this new functionality.
          Hide
          Knut Anders Hatlen added a comment -

          After the latest changes, derbytools.jar includes many of the iapi classes that previously were only included in derby.jar. In addition to increasing the size of derbytools.jar (from 245KB to 680KB in a sane build) it causes sealing violations in the nightly tests on some platforms, for example here: http://download.java.net/javadesktop/derby/javadb-5576120-report/javadb-5576120-3641400-details.html

          Show
          Knut Anders Hatlen added a comment - After the latest changes, derbytools.jar includes many of the iapi classes that previously were only included in derby.jar. In addition to increasing the size of derbytools.jar (from 245KB to 680KB in a sane build) it causes sealing violations in the nightly tests on some platforms, for example here: http://download.java.net/javadesktop/derby/javadb-5576120-report/javadb-5576120-3641400-details.html
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-03-aa-customTracer.diff. Committed at subversion revision 1479552.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-03-aa-customTracer.diff. Committed at subversion revision 1479552.
          Hide
          Rick Hillegas added a comment -

          Thanks for catching that, Knut. I think this may mean that OptimizerTracer needs to move out of the tools jar and into the engine jar. The right thing might be to move all of the optional tools into the engine jar since none of them really needs to be in the tools jar. Dag wondered about this in a 2013-01-03 comment on DERBY-6022. For the moment I will just move OptimizerTracer and see if that helps. Thanks.

          Show
          Rick Hillegas added a comment - Thanks for catching that, Knut. I think this may mean that OptimizerTracer needs to move out of the tools jar and into the engine jar. The right thing might be to move all of the optional tools into the engine jar since none of them really needs to be in the tools jar. Dag wondered about this in a 2013-01-03 comment on DERBY-6022 . For the moment I will just move OptimizerTracer and see if that helps. Thanks.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff. This patch moves the OptimizerTracer optional tool out of the tools jar and into the engine jar. I am running tests now.

          With this patch, the tools jar shrinks back to its old size. Hopefully this will address the sealing violations in the compatibility tests.

          Touches the following files:

          ----------------

          D java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java
          A + java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java
          M java/engine/org/apache/derby/impl/sql/build.xml
          M tools/jar/extraDBMSclasses.properties
          M tools/jar/tools.properties

          Build changes for this move. I added a Java 6 build target to java/engine/org/apache/derby/impl/sql/build.xml because OptimizerTracer requires varargs in order to compile. We can rip out this new target when we deprecate support for Java 5 and CDC.

          ----------------

          M java/engine/org/apache/derby/catalog/Java5SystemProcedures.java

          Look for OptimizerTracer in its new package.

          ----------------

          M java/tools/org/apache/derby/loc/toolsmessages.properties
          M java/engine/org/apache/derby/loc/messages.xml
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java

          Move some OptimizerTracer messages to the engine message file.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff. This patch moves the OptimizerTracer optional tool out of the tools jar and into the engine jar. I am running tests now. With this patch, the tools jar shrinks back to its old size. Hopefully this will address the sealing violations in the compatibility tests. Touches the following files: ---------------- D java/tools/org/apache/derby/impl/tools/optional/OptimizerTracer.java A + java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java M java/engine/org/apache/derby/impl/sql/build.xml M tools/jar/extraDBMSclasses.properties M tools/jar/tools.properties Build changes for this move. I added a Java 6 build target to java/engine/org/apache/derby/impl/sql/build.xml because OptimizerTracer requires varargs in order to compile. We can rip out this new target when we deprecate support for Java 5 and CDC. ---------------- M java/engine/org/apache/derby/catalog/Java5SystemProcedures.java Look for OptimizerTracer in its new package. ---------------- M java/tools/org/apache/derby/loc/toolsmessages.properties M java/engine/org/apache/derby/loc/messages.xml M java/shared/org/apache/derby/shared/common/reference/SQLState.java Move some OptimizerTracer messages to the engine message file.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff. Committed at subversion revision 1479642.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff. Committed at subversion revision 1479642.
          Hide
          Rick Hillegas added a comment - - edited

          Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer which produces its output in xml. In addition, this patch adds an optional tool for viewing that xml output using SQL. I am running tests now.

          I find that it is very hard to read the existing optimizer traces for the following reasons:

          i) The trace output is not indented to indicate how facts relate to one another.

          ii) In particular, it is hard to piece together the shapes of the query plans which are being evaluated.

          I hope that this xml output is easier to read and understand. The output contains elements which nest like the corresponding optimizer data structures:

          A) statement - This is the text of an SQL statement which needs optimization.

          B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains many branches, each of which is its own query block. Most subqueries are also independent query blocks. Each query block is optimized in isolation from the others. The isolation goes so far that each query block gets its own optimizer instance.

          C) joinOrder - For each query block, the optimizer may consider several join orders. These are the left-to-right orders in which tables would be accessed at execution time. The optimizer builds up a join order incrementally, starting from the leftmost position and adding more tables as it goes. The optimizer may abandon a join order before it is completely filled out. This happens if the optimizer determines that no completion of the join order can result in a plan that's cheaper than the best plan found so far.

          D) decoration - As the optimizer fills out the join order, it also considers which conglomerate to use for each table and how to join the table to the table to its left. Of course, the leftmost table doesn't join to anything before it, so for the leftmost table the join strategy is always NESTED_LOOP.

          The following other elements appear in the xml output:

          E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs of decorated but partial join orders. The planCost element nests inside the joinOrder element. In addition, each queryBlock contains a best planCost.

          F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan a table. This element nests inside the decoration element.

          In addition to presenting cost information, the planCost element presents two descriptions of the decorated join order being evaluated:

          1) summary - This is meant to be a compact, precise description of the plan which we might consider using in an optimizer override. This description identifies conglomerates by the (often cryptic) names stored in SYS.SYSCONGLOMERATES.

          2) verbose - This is meant to be a more human-readable description of the plan. Tables are identified by their SQL names or by their correlation names in the query. In addition, index column names are included if the table is accessed by an index.

          Although the optimizer considers how tables join leftward, English speakers will want to view the join order rightward. That is how the descriptions are written. In addition, I have introduced the following infix operators to represent the join strategies:

          • NESTED_LOOP
          1. HASH_JOIN

          I have also fully parenthesized the plan descriptions even though Derby only supports left-deep parentheses today. In the future, Derby may support bushy join orders, requiring different parenthesizing. Putting all of this together, here is a sample summary description:

          ((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8

          ...and here is the corresponding verbose description:

          ((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

          For the following query...

          select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences s
          where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename

          ...here's a sample verbose plan description:

          ((C # T

          {TABLENAME,SCHEMAID}

          ) # A

          {SCHEMAID,ALIAS,NAMESPACE}

          ) # S

          {SCHEMAID,SEQUENCENAME}

          I think that this xml output is much easier to read than traditional Derby optimizer traces. If you use a browser like Firefox, you can collapse and expand elements in order to expose just the information you want to see.

          This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer tracing along with this optional viewing tool. Note that the tracer wants a file name argument but the viewer wants a file url argument:

          – turn on xml-based optimizer tracing
          call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

          – run this query through the tracer
          select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a = tab4.a;

          – turn off optimizer tracing
          call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

          – load the trace viewer
          call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml' );

          – view the costs of all complete plans
          select estimatedCost, verbose from planCost
          where complete
          order by estimatedCost
          ;

          – unload the trace viewer
          call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

          Here is the output of the query against the planCost view:

          ESTIMATEDCOST |VERBOSE
          ------------------------------------------------------------
          8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3
          8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4
          8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4
          9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3
          9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4
          9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4
          18224.740429176647 |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1
          18224.740429176647 |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1
          27806.645417006024 |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1
          27806.645417006024 |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1
          27903.296430890066 |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1
          27903.296430890066 |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1
          28336.34888640299 |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1
          28336.34888640299 |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1
          28336.34888640299 |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1
          28336.34888640299 |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1
          28336.34888640299 |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1
          28336.34888640299 |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1

          Here is the full shape of the planCost view:

          (
          text varchar( 32672 ),
          stmtID int,
          qbID int,
          complete boolean,
          summary varchar( 32672 ),
          verbose varchar( 32672 ),
          type varchar( 50 ),
          estimatedCost double,
          estimatedRowCount bigint
          )

          I think this functionality is useful enough now that other people can test-drive it. Before writing regression tests for this patch, I would like to gather feedback from developers about how to improve this basic functionality. For instance, is this readable enough? Is there some information from optimizer traces which you often use but which is missing from the xml output?

          Further improvements could include:

          I) Adding more information to the xml trace. Right now, I have only implemented a subset of the trace methods.

          II) Adding more SQL views for reading the xml trace output.

          Touches the following files:

          ------------------

          M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

          Changes to the signatures of some of the trace methods so that they give the xml tracer the information it needs.

          ------------------

          M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

          Each JoinStrategy now has an operator symbol for use in planCost descriptions.

          ------------------

          A java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

          The new xml trace logic.

          ------------------

          A java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
          M java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
          M tools/jar/extraDBMSclasses.properties

          The new optional tool for viewing xml trace output.

          Show
          Rick Hillegas added a comment - - edited Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer which produces its output in xml. In addition, this patch adds an optional tool for viewing that xml output using SQL. I am running tests now. I find that it is very hard to read the existing optimizer traces for the following reasons: i) The trace output is not indented to indicate how facts relate to one another. ii) In particular, it is hard to piece together the shapes of the query plans which are being evaluated. I hope that this xml output is easier to read and understand. The output contains elements which nest like the corresponding optimizer data structures: A) statement - This is the text of an SQL statement which needs optimization. B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains many branches, each of which is its own query block. Most subqueries are also independent query blocks. Each query block is optimized in isolation from the others. The isolation goes so far that each query block gets its own optimizer instance. C) joinOrder - For each query block, the optimizer may consider several join orders. These are the left-to-right orders in which tables would be accessed at execution time. The optimizer builds up a join order incrementally, starting from the leftmost position and adding more tables as it goes. The optimizer may abandon a join order before it is completely filled out. This happens if the optimizer determines that no completion of the join order can result in a plan that's cheaper than the best plan found so far. D) decoration - As the optimizer fills out the join order, it also considers which conglomerate to use for each table and how to join the table to the table to its left. Of course, the leftmost table doesn't join to anything before it, so for the leftmost table the join strategy is always NESTED_LOOP. The following other elements appear in the xml output: E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs of decorated but partial join orders. The planCost element nests inside the joinOrder element. In addition, each queryBlock contains a best planCost. F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan a table. This element nests inside the decoration element. In addition to presenting cost information, the planCost element presents two descriptions of the decorated join order being evaluated: 1) summary - This is meant to be a compact, precise description of the plan which we might consider using in an optimizer override. This description identifies conglomerates by the (often cryptic) names stored in SYS.SYSCONGLOMERATES. 2) verbose - This is meant to be a more human-readable description of the plan. Tables are identified by their SQL names or by their correlation names in the query. In addition, index column names are included if the table is accessed by an index. Although the optimizer considers how tables join leftward, English speakers will want to view the join order rightward. That is how the descriptions are written. In addition, I have introduced the following infix operators to represent the join strategies: NESTED_LOOP HASH_JOIN I have also fully parenthesized the plan descriptions even though Derby only supports left-deep parentheses today. In the future, Derby may support bushy join orders, requiring different parenthesizing. Putting all of this together, here is a sample summary description: ((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8 ...and here is the corresponding verbose description: ((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4 For the following query... select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences s where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename ...here's a sample verbose plan description: ((C # T {TABLENAME,SCHEMAID} ) # A {SCHEMAID,ALIAS,NAMESPACE} ) # S {SCHEMAID,SEQUENCENAME} I think that this xml output is much easier to read than traditional Derby optimizer traces. If you use a browser like Firefox, you can collapse and expand elements in order to expose just the information you want to see. This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer tracing along with this optional viewing tool. Note that the tracer wants a file name argument but the viewer wants a file url argument: – turn on xml-based optimizer tracing call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' ); – run this query through the tracer select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a = tab4.a; – turn off optimizer tracing call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' ); – load the trace viewer call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml' ); – view the costs of all complete plans select estimatedCost, verbose from planCost where complete order by estimatedCost ; – unload the trace viewer call syscs_util.syscs_register_tool( 'optimizerTracingViews', false ); Here is the output of the query against the planCost view: ESTIMATEDCOST |VERBOSE ------------------------------------------------------------ 8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3 8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4 8595.596000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4 9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3 9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4 9114.756000000001 |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4 18224.740429176647 |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1 18224.740429176647 |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1 27806.645417006024 |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1 27806.645417006024 |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1 27903.296430890066 |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1 27903.296430890066 |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1 28336.34888640299 |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1 28336.34888640299 |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1 28336.34888640299 |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1 28336.34888640299 |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1 28336.34888640299 |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1 28336.34888640299 |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1 Here is the full shape of the planCost view: ( text varchar( 32672 ), stmtID int, qbID int, complete boolean, summary varchar( 32672 ), verbose varchar( 32672 ), type varchar( 50 ), estimatedCost double, estimatedRowCount bigint ) I think this functionality is useful enough now that other people can test-drive it. Before writing regression tests for this patch, I would like to gather feedback from developers about how to improve this basic functionality. For instance, is this readable enough? Is there some information from optimizer traces which you often use but which is missing from the xml output? Further improvements could include: I) Adding more information to the xml trace. Right now, I have only implemented a subset of the trace methods. II) Adding more SQL views for reading the xml trace output. Touches the following files: ------------------ M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java M java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java Changes to the signatures of some of the trace methods so that they give the xml tracer the information it needs. ------------------ M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java Each JoinStrategy now has an operator symbol for use in planCost descriptions. ------------------ A java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java The new xml trace logic. ------------------ A java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java M java/engine/org/apache/derby/catalog/Java5SystemProcedures.java M tools/jar/extraDBMSclasses.properties The new optional tool for viewing xml trace output.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-05-aa-xmlOptimizerTracer.diff. Committed at subversion revision 1491893.

          JIRA was not reachable when I committed this patch. It appears that the "ASF subversion and git services" commit trigger does not buffer up and retry a commit comment if it fails to add the comment immediately after committing.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-05-aa-xmlOptimizerTracer.diff. Committed at subversion revision 1491893. JIRA was not reachable when I committed this patch. It appears that the "ASF subversion and git services" commit trigger does not buffer up and retry a commit comment if it fails to add the comment immediately after committing.
          Hide
          Mike Matrigali added a comment -

          Can you describe the impact on derby.jar. In general I would like to see tools go in some other optional jar, and as much as possible separate
          from the mainline server code. I understand for this particular work there is server based work to provide the framework to plug.

          Is there an extra cost (size or execution lines) added for a server that is not interested in optimizer trace?

          Show
          Mike Matrigali added a comment - Can you describe the impact on derby.jar. In general I would like to see tools go in some other optional jar, and as much as possible separate from the mainline server code. I understand for this particular work there is server based work to provide the framework to plug. Is there an extra cost (size or execution lines) added for a server that is not interested in optimizer trace?
          Hide
          Rick Hillegas added a comment -

          Hi Mike,

          The xml-based tracing adds 2 new classes to the engine jar:

          o XMLOptTrace. This is the xml tracer. It needs to be in the engine jar because it references many engine classes. It weighs 23K.

          o OptTraceViewer. This is the code which loads and unloads the optional tool for viewing the xml output. It weighs 4K.

          Here's how xml-based tracing affects Derby processing:

          o If you don't turn on optimizer tracing, then there is no effect. That is because the tracing methods are only called if tracing is turned on. No new tracing methods have been added.

          o If you turn on optimizer tracing and you use the old-style tracer, then there is a tiny effect: one extra (unused) argument apiece has been added to the signatures of two tracing methods. I don't think that this extra argument passing will give rise to any measurable performance drag.

          o If you turn on optimizer tracing and you use xml-based tracing, then for some trace calls, the tracing may create more transient objects. Other trace calls, which are NOPs in the xml-tracer, will create fewer transient objects. I haven't measured the performance effects of turning on either style of tracing.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Hi Mike, The xml-based tracing adds 2 new classes to the engine jar: o XMLOptTrace. This is the xml tracer. It needs to be in the engine jar because it references many engine classes. It weighs 23K. o OptTraceViewer. This is the code which loads and unloads the optional tool for viewing the xml output. It weighs 4K. Here's how xml-based tracing affects Derby processing: o If you don't turn on optimizer tracing, then there is no effect. That is because the tracing methods are only called if tracing is turned on. No new tracing methods have been added. o If you turn on optimizer tracing and you use the old-style tracer, then there is a tiny effect: one extra (unused) argument apiece has been added to the signatures of two tracing methods. I don't think that this extra argument passing will give rise to any measurable performance drag. o If you turn on optimizer tracing and you use xml-based tracing, then for some trace calls, the tracing may create more transient objects. Other trace calls, which are NOPs in the xml-tracer, will create fewer transient objects. I haven't measured the performance effects of turning on either style of tracing. Thanks, -Rick
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-06-ab-packageProtect-XMLOptTrace.diff. This patch reduces the visibility of XMLOptTrace to package visibility.

          Touches the following file:

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          Show
          Rick Hillegas added a comment - Attaching derby-6211-06-ab-packageProtect-XMLOptTrace.diff. This patch reduces the visibility of XMLOptTrace to package visibility. Touches the following file: M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
          Hide
          ASF subversion and git services added a comment -

          Commit 1492378 from Rick Hillegas
          [ https://svn.apache.org/r1492378 ]

          DERBY-6211: Commit derby-6211-06-ab-packageProtect-XMLOptTrace.diff, reducing the visibility of XMLOptTrace.

          Show
          ASF subversion and git services added a comment - Commit 1492378 from Rick Hillegas [ https://svn.apache.org/r1492378 ] DERBY-6211 : Commit derby-6211-06-ab-packageProtect-XMLOptTrace.diff, reducing the visibility of XMLOptTrace.
          Hide
          Dag H. Wanvik added a comment - - edited

          I took the XML tracing for a test drive, also using the optional view tool to check the results and I think the functionality is useful. I haven't really done enough work in the optimizer to be able to say what possible extra trace might be useful, but it seems straight forward to add more tracing as we see the need.
          The new notation is helpful to quickly grok what plan has been selected.

          Question: If I do this twice, e.g one with XML, one with standard text, as in this stanza, I see no proper trace in round two. Why? (possible answer below)

          connect 'jdbc:derby:memory:wombat;create=true';
          create table t(i int primary key, j int);
          insert into t values (1,1),(2,2);
          call syscs_util.syscs_register_tool( 'optimizerTracing', true);
          select * from t a, t b;
          call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'plan.txt');
          call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml');
          select * from t a, t b;
          call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'plan.xml');

          It doesn't matter if I use xml or traditional trace, the second time, I see no real plan in the output file. I guess the statement cache tricks us here...
          It it a tad confusing for the user... perhaps we could force recompilation the next statement after we register optimizer tracing?

          Show
          Dag H. Wanvik added a comment - - edited I took the XML tracing for a test drive, also using the optional view tool to check the results and I think the functionality is useful. I haven't really done enough work in the optimizer to be able to say what possible extra trace might be useful, but it seems straight forward to add more tracing as we see the need. The new notation is helpful to quickly grok what plan has been selected. Question: If I do this twice, e.g one with XML, one with standard text, as in this stanza, I see no proper trace in round two. Why? (possible answer below) connect 'jdbc:derby:memory:wombat;create=true'; create table t(i int primary key, j int); insert into t values (1,1),(2,2); call syscs_util.syscs_register_tool( 'optimizerTracing', true); select * from t a, t b; call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'plan.txt'); call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml'); select * from t a, t b; call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'plan.xml'); It doesn't matter if I use xml or traditional trace, the second time, I see no real plan in the output file. I guess the statement cache tricks us here... It it a tad confusing for the user... perhaps we could force recompilation the next statement after we register optimizer tracing?
          Hide
          Rick Hillegas added a comment -

          Thanks for that feedback, Dag. Concerning the experiment you ran when you traced the same statement twice in a row: I also blundered into this behavior and came to the same conclusion. The statement is recorded in the trace but the optimizer is never called because the plan has already been compiled and cached. As you note, it doesn't matter what kind of tracing you request. A slight enhancement to your suggestion might be to always skip the statement cache when optimizer tracing is on. Thanks.

          Show
          Rick Hillegas added a comment - Thanks for that feedback, Dag. Concerning the experiment you ran when you traced the same statement twice in a row: I also blundered into this behavior and came to the same conclusion. The statement is recorded in the trace but the optimizer is never called because the plan has already been compiled and cached. As you note, it doesn't matter what kind of tracing you request. A slight enhancement to your suggestion might be to always skip the statement cache when optimizer tracing is on. Thanks.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-07-aa-useSchemaQualifiedNamesInSummaries.diff. This patch makes a couple changes to how object names are presented in xml-based optimizer traces.

          This patch makes these significant changes:

          1) Uses fully qualified (schema + object) names to identify conglomerates and table functions in plan summaries.

          2) Prefixes table function names with the [ character in order to distinguish them from the names of indexes.

          3) Fixes some NPEs when tracing queries which involve table functions.

          The idea is to produce plan summaries which are complete enough that they can be used to specify complete plans via the work to be done on DERBY-6267.

          The extra [ character used to distinguish table functions from indexes is cryptic, idiosyncratic, and maybe overkill. Moreover, there's only a low risk that the name spaces would conflict. But I thought it was better to be cautious here. Please let me know if you have another idea for how to distinguish the name spaces.

          Here are sample plan summaries involving tables, views, and table functions. Note that views are replaced with base tables before Derby gets to the optimizer.

          This query...

          select tablename from v, sys.syscolumns where tablename = columnname

          ...produces a summary plan like this:

          "SYS"."SYSCOLUMNS_HEAP" # "SYS"."SYSTABLES_INDEX1"

          This query...

          select columnname from sys.syscolumns c, table( integerList() ) i where c.columnNumber = i.a and i.a < 0

          ...produces a summary plan like this:

          "SYS"."SYSCOLUMNS_HEAP" # ["APP"."INTEGERLIST"

          Touches the following files:

          -------------

          M java/engine/org/apache/derby/impl/sql/compile/TableName.java

          Adds a new method which prints a TableName as a schema-qualified, quoted, and escaped SQL name.

          -------------

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          Makes the changes described above.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-07-aa-useSchemaQualifiedNamesInSummaries.diff. This patch makes a couple changes to how object names are presented in xml-based optimizer traces. This patch makes these significant changes: 1) Uses fully qualified (schema + object) names to identify conglomerates and table functions in plan summaries. 2) Prefixes table function names with the [ character in order to distinguish them from the names of indexes. 3) Fixes some NPEs when tracing queries which involve table functions. The idea is to produce plan summaries which are complete enough that they can be used to specify complete plans via the work to be done on DERBY-6267 . The extra [ character used to distinguish table functions from indexes is cryptic, idiosyncratic, and maybe overkill. Moreover, there's only a low risk that the name spaces would conflict. But I thought it was better to be cautious here. Please let me know if you have another idea for how to distinguish the name spaces. Here are sample plan summaries involving tables, views, and table functions. Note that views are replaced with base tables before Derby gets to the optimizer. This query... select tablename from v, sys.syscolumns where tablename = columnname ...produces a summary plan like this: "SYS"."SYSCOLUMNS_HEAP" # "SYS"."SYSTABLES_INDEX1" This query... select columnname from sys.syscolumns c, table( integerList() ) i where c.columnNumber = i.a and i.a < 0 ...produces a summary plan like this: "SYS"."SYSCOLUMNS_HEAP" # ["APP"."INTEGERLIST" Touches the following files: ------------- M java/engine/org/apache/derby/impl/sql/compile/TableName.java Adds a new method which prints a TableName as a schema-qualified, quoted, and escaped SQL name. ------------- M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java Makes the changes described above.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff. Tests passed cleanly on the previous rev of the patch. The new rev of the patch improves the notation for table functions, based on a suggestion by Dag given over a private channel.

          Dag suggested that "APP"."INTEGERLIST"() would be less cryptic and idiosyncratic than ["APP"."INTEGERLIST". I agree. Thanks, Dag.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff. Tests passed cleanly on the previous rev of the patch. The new rev of the patch improves the notation for table functions, based on a suggestion by Dag given over a private channel. Dag suggested that "APP"."INTEGERLIST"() would be less cryptic and idiosyncratic than ["APP"."INTEGERLIST". I agree. Thanks, Dag.
          Hide
          ASF subversion and git services added a comment -

          Commit 1494954 from Rick Hillegas
          [ https://svn.apache.org/r1494954 ]

          DERBY-6211: Use schema-qualified names in plan summaries printed by the xml-based optimizer tracer; commit derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff.

          Show
          ASF subversion and git services added a comment - Commit 1494954 from Rick Hillegas [ https://svn.apache.org/r1494954 ] DERBY-6211 : Use schema-qualified names in plan summaries printed by the xml-based optimizer tracer; commit derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-08-aa-fixNPE.diff. This patch fixes an NPE in xml-based optimizer tracing. The NPE was introduced by recent work on removing the NodeFactory. The NPE was not discovered because there are no regression tests for xml-based optimizer tracing yet.

          Touches the following file:

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          Show
          Rick Hillegas added a comment - Attaching derby-6211-08-aa-fixNPE.diff. This patch fixes an NPE in xml-based optimizer tracing. The NPE was introduced by recent work on removing the NodeFactory. The NPE was not discovered because there are no regression tests for xml-based optimizer tracing yet. Touches the following file: M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
          Hide
          ASF subversion and git services added a comment -

          Commit 1497537 from Rick Hillegas
          [ https://svn.apache.org/r1497537 ]

          DERBY-6211: Fix NPE in xml-based optimizer tracing; committing derby-6211-08-aa-fixNPE.diff.

          Show
          ASF subversion and git services added a comment - Commit 1497537 from Rick Hillegas [ https://svn.apache.org/r1497537 ] DERBY-6211 : Fix NPE in xml-based optimizer tracing; committing derby-6211-08-aa-fixNPE.diff.
          Hide
          Bryan Pendleton added a comment -

          Hi Rick, Do you think it is a complex task to write some tests for XML-based optimizer tracing?

          I am wondering if it would be a good sub-task of DERBY-6164.

          Show
          Bryan Pendleton added a comment - Hi Rick, Do you think it is a complex task to write some tests for XML-based optimizer tracing? I am wondering if it would be a good sub-task of DERBY-6164 .
          Hide
          Rick Hillegas added a comment -

          Hi Bryan,

          I'm not quite ready to write regression tests for xml-based optimizer tracing. I apologize that this creates more gaps for the code-coverage project. I have at least one more change to make to the xml output before I add some tests.

          Right now I'm working on adding compact optimizer overrides for complete query plans (DERBY-6267). The idea is that the plan summaries output by xml-based optimizer tracing will work as plan descriptions in optimizer overrides. I've made a lot of progress on DERBY-6267 and may have a patch to show the community in a couple days.

          Until then, I want to hold off writing tests for xml-based optimizer tracing, just in case the format of the plan summaries changes. For the same reason, I've been holding off writing tests for xml-based execution plans (DERBY-6266). I want to use those ResultSet summaries in the tests which verify the work on DERBY-6267. And there's a possibility that the ResultSet summaries may have to change a bit in order to serve that purpose.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Hi Bryan, I'm not quite ready to write regression tests for xml-based optimizer tracing. I apologize that this creates more gaps for the code-coverage project. I have at least one more change to make to the xml output before I add some tests. Right now I'm working on adding compact optimizer overrides for complete query plans ( DERBY-6267 ). The idea is that the plan summaries output by xml-based optimizer tracing will work as plan descriptions in optimizer overrides. I've made a lot of progress on DERBY-6267 and may have a patch to show the community in a couple days. Until then, I want to hold off writing tests for xml-based optimizer tracing, just in case the format of the plan summaries changes. For the same reason, I've been holding off writing tests for xml-based execution plans ( DERBY-6266 ). I want to use those ResultSet summaries in the tests which verify the work on DERBY-6267 . And there's a possibility that the ResultSet summaries may have to change a bit in order to serve that purpose. Thanks, -Rick
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-09-aa-addTests.diff. This patch adds tests for xml-based optimizer tracing. I am running the full tests now.

          This patch makes a couple changes:

          1) Wraps a privilege block around the file writes performed by optimizer tracing.

          2) Wraps a privilege block around the file reading performed by the XmlVTI.

          3) Changes xml-based optimizer tracing:

          a) Removes the verbose summary element. After other changes to the summary element, I don't think that the verbose element adds much value any more.

          b) Uses OptimizerPlan.toString() to format plan summaries. This should help enforce the contract that summary lines can be used as optimizer overrides.

          4) Adds tests for xml-based optimizer tracing. More extensive tests would be useful but I haven't figured out a way to ensure cross-platform stability for the results of more complicated tests. I am hoping that these tests turn out to be stable when run as part of the full regression test suite.

          While experimenting with these tests, I tripped across some interesting behaviors:

          A) An optimizer trace (either xml-based or otherwise) can show two separate optimizer sessions for a query. The first session happens when you prepare the query. The second happens when the execution logic decides that it needs to re-prepare the query you just prepared. I saw this behavior for a query against tables which I had just stuffed with many rows. Maybe statistics were being automatically calculated between the initial preparation and the actual execution of the query.

          B) The order in which plan shapes appear in the optimizer trace is not deterministic. This may be an instability in the planCost table function. But the instability may be in the optimizer itself. Maybe the optimizer considers plans in an unstable order. That would be weird and troubling.

          Touches the following files:

          ------------

          M java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

          (1)

          ------------

          M java/engine/org/apache/derby/vti/XmlVTI.java

          (2)

          ------------

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          (3)

          ------------

          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/RSMDWrapper.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java
          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/XMLOptimizerTraceTest.java

          (4). Along the way, I checked in a slightly amended version of the RSMDWrapper table function attached to DERBY-4926. This creates a table function out of the ResultSetMetaData returned by a query; it can be useful in tests which need to verify the shapes of ResultSets.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-09-aa-addTests.diff. This patch adds tests for xml-based optimizer tracing. I am running the full tests now. This patch makes a couple changes: 1) Wraps a privilege block around the file writes performed by optimizer tracing. 2) Wraps a privilege block around the file reading performed by the XmlVTI. 3) Changes xml-based optimizer tracing: a) Removes the verbose summary element. After other changes to the summary element, I don't think that the verbose element adds much value any more. b) Uses OptimizerPlan.toString() to format plan summaries. This should help enforce the contract that summary lines can be used as optimizer overrides. 4) Adds tests for xml-based optimizer tracing. More extensive tests would be useful but I haven't figured out a way to ensure cross-platform stability for the results of more complicated tests. I am hoping that these tests turn out to be stable when run as part of the full regression test suite. While experimenting with these tests, I tripped across some interesting behaviors: A) An optimizer trace (either xml-based or otherwise) can show two separate optimizer sessions for a query. The first session happens when you prepare the query. The second happens when the execution logic decides that it needs to re-prepare the query you just prepared. I saw this behavior for a query against tables which I had just stuffed with many rows. Maybe statistics were being automatically calculated between the initial preparation and the actual execution of the query. B) The order in which plan shapes appear in the optimizer trace is not deterministic. This may be an instability in the planCost table function. But the instability may be in the optimizer itself. Maybe the optimizer considers plans in an unstable order. That would be weird and troubling. Touches the following files: ------------ M java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java (1) ------------ M java/engine/org/apache/derby/vti/XmlVTI.java (2) ------------ M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java (3) ------------ A java/testing/org/apache/derbyTesting/functionTests/tests/lang/RSMDWrapper.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java A java/testing/org/apache/derbyTesting/functionTests/tests/lang/XMLOptimizerTraceTest.java (4). Along the way, I checked in a slightly amended version of the RSMDWrapper table function attached to DERBY-4926 . This creates a table function out of the ResultSetMetaData returned by a query; it can be useful in tests which need to verify the shapes of ResultSets.
          Hide
          ASF subversion and git services added a comment -

          Commit 1500056 from Rick Hillegas
          [ https://svn.apache.org/r1500056 ]

          DERBY-6211: Add basic tests for xml-based optimizer tracing; tests passed cleanly on derby-6211-09-aa-addTests.diff.

          Show
          ASF subversion and git services added a comment - Commit 1500056 from Rick Hillegas [ https://svn.apache.org/r1500056 ] DERBY-6211 : Add basic tests for xml-based optimizer tracing; tests passed cleanly on derby-6211-09-aa-addTests.diff.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for writing the test, Rick.

          I think the two privilege blocks can be narrowed down. I don't think there's a problem at the moment, but keeping the blocks as small as possible reduces the risk of problematic code sneaking into the privilege blocks by accident later.

          I only found these two lines that needed to run with privileges:

          + FileInputStream is = new FileInputStream( file );

          and

          + pw = new PrintWriter( configurationParameters[ 0 ] );

          (I didn't test, so I may have missed some.)

          Show
          Knut Anders Hatlen added a comment - Thanks for writing the test, Rick. I think the two privilege blocks can be narrowed down. I don't think there's a problem at the moment, but keeping the blocks as small as possible reduces the risk of problematic code sneaking into the privilege blocks by accident later. I only found these two lines that needed to run with privileges: + FileInputStream is = new FileInputStream( file ); and + pw = new PrintWriter( configurationParameters[ 0 ] ); (I didn't test, so I may have missed some.)
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-10-aa-makingCostEstimateObject.diff. This patch simplifies the construction of CostEstimate objects and fixes an NPE when running XML-based optimizer tracing against an outer join query. Tests passed cleanly for me on this patch.

          While investigating the optimizer's handling of outer joins, I tripped across 2 problems:

          1) The xml-based optimizer tracer dies on an NPE in trying to assign a table name to the ProjectRestrictNode which sits on top of the outer join node. The fix, for the moment, is to give the ProjectRestrictNode a generic name.

          2) During outer join optimization, other optimizers are created for the sole purpose of asking them to manufacture empty CostEstimate objects. This is wrong for several reasons:

          a) It violates the principle that an optimizer instance corresponds to a query block.

          b) It's inefficient.

          c) It's unnecessary because there is already a method on OptimizerFactory for creating an empty CostEstimate object.

          The fix to (2) is to make Optimizer.newCostEstimate() private, remove the pointless instantiations of new optimizers, and call OptimizerFactory.getCostEstimate() to make empty CostEstimates.

          Touches the following files:

          --------------------

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          Changes for (1).

          --------------------

          M java/engine/org/apache/derby/iapi/sql/compile/OptimizerFactory.java
          M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java
          M java/engine/org/apache/derby/impl/sql/compile/FromTable.java
          M java/engine/org/apache/derby/impl/sql/compile/SingleChildResultSetNode.java
          M java/engine/org/apache/derby/impl/sql/compile/DistinctNode.java
          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java
          M java/engine/org/apache/derby/impl/sql/compile/GroupByNode.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerFactoryImpl.java
          M java/engine/org/apache/derby/impl/sql/compile/RowResultSetNode.java
          M java/engine/org/apache/derby/impl/sql/compile/CurrentOfNode.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
          M java/engine/org/apache/derby/impl/sql/compile/TableOperatorNode.java

          Changes for (2).

          Show
          Rick Hillegas added a comment - Attaching derby-6211-10-aa-makingCostEstimateObject.diff. This patch simplifies the construction of CostEstimate objects and fixes an NPE when running XML-based optimizer tracing against an outer join query. Tests passed cleanly for me on this patch. While investigating the optimizer's handling of outer joins, I tripped across 2 problems: 1) The xml-based optimizer tracer dies on an NPE in trying to assign a table name to the ProjectRestrictNode which sits on top of the outer join node. The fix, for the moment, is to give the ProjectRestrictNode a generic name. 2) During outer join optimization, other optimizers are created for the sole purpose of asking them to manufacture empty CostEstimate objects. This is wrong for several reasons: a) It violates the principle that an optimizer instance corresponds to a query block. b) It's inefficient. c) It's unnecessary because there is already a method on OptimizerFactory for creating an empty CostEstimate object. The fix to (2) is to make Optimizer.newCostEstimate() private, remove the pointless instantiations of new optimizers, and call OptimizerFactory.getCostEstimate() to make empty CostEstimates. Touches the following files: -------------------- M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java Changes for (1). -------------------- M java/engine/org/apache/derby/iapi/sql/compile/OptimizerFactory.java M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java M java/engine/org/apache/derby/impl/sql/compile/FromTable.java M java/engine/org/apache/derby/impl/sql/compile/SingleChildResultSetNode.java M java/engine/org/apache/derby/impl/sql/compile/DistinctNode.java M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java M java/engine/org/apache/derby/impl/sql/compile/GroupByNode.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerFactoryImpl.java M java/engine/org/apache/derby/impl/sql/compile/RowResultSetNode.java M java/engine/org/apache/derby/impl/sql/compile/CurrentOfNode.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java M java/engine/org/apache/derby/impl/sql/compile/TableOperatorNode.java Changes for (2).
          Hide
          ASF subversion and git services added a comment -

          Commit 1509671 from Rick Hillegas in branch 'code/trunk'
          [ https://svn.apache.org/r1509671 ]

          DERBY-6211: Don't instantiate new optimizers just to create empty CostEstimate objects; commit derby-6211-10-aa-makingCostEstimateObject.diff.

          Show
          ASF subversion and git services added a comment - Commit 1509671 from Rick Hillegas in branch 'code/trunk' [ https://svn.apache.org/r1509671 ] DERBY-6211 : Don't instantiate new optimizers just to create empty CostEstimate objects; commit derby-6211-10-aa-makingCostEstimateObject.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-11-aa-moveTracerOutOfOptimizer.diff. This is a follow-on patch to the previous patch. This patch moves the tracer out of the optimizer to reduce the likelihood that someone will try to instantiate a new optimizer just to get the tracer. I am running tests now.

          Touches the following files:

          M java/engine/org/apache/derby/iapi/sql/compile/Optimizable.java
          M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java
          M java/engine/org/apache/derby/impl/sql/compile/FromTable.java
          M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java
          M java/engine/org/apache/derby/impl/sql/compile/JoinNode.java
          M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java

          Show
          Rick Hillegas added a comment - Attaching derby-6211-11-aa-moveTracerOutOfOptimizer.diff. This is a follow-on patch to the previous patch. This patch moves the tracer out of the optimizer to reduce the likelihood that someone will try to instantiate a new optimizer just to get the tracer. I am running tests now. Touches the following files: M java/engine/org/apache/derby/iapi/sql/compile/Optimizable.java M java/engine/org/apache/derby/iapi/sql/compile/Optimizer.java M java/engine/org/apache/derby/impl/sql/compile/FromTable.java M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java M java/engine/org/apache/derby/impl/sql/compile/JoinNode.java M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
          Hide
          Rick Hillegas added a comment -

          Attaching second rev of patch, derby-6211-11-ab-moveTracerOutOfOptimizer.diff. This fixes a bug in the first rev which raised NPE in certain situations and hung the tests. I am re-running the tests now.

          Touches the same files as the previous rev.

          Show
          Rick Hillegas added a comment - Attaching second rev of patch, derby-6211-11-ab-moveTracerOutOfOptimizer.diff. This fixes a bug in the first rev which raised NPE in certain situations and hung the tests. I am re-running the tests now. Touches the same files as the previous rev.
          Hide
          ASF subversion and git services added a comment -

          Commit 1509815 from Rick Hillegas in branch 'code/trunk'
          [ https://svn.apache.org/r1509815 ]

          DERBY-6211: Move the tracer out of the optimizer; tests passed cleanly on derby-6211-11-ab-moveTracerOutOfOptimizer.diff.

          Show
          ASF subversion and git services added a comment - Commit 1509815 from Rick Hillegas in branch 'code/trunk' [ https://svn.apache.org/r1509815 ] DERBY-6211 : Move the tracer out of the optimizer; tests passed cleanly on derby-6211-11-ab-moveTracerOutOfOptimizer.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-12-aa-traceEndOfQueryBlock.diff. This patch makes a couple improvements to the xml-based optimizer tracing of outer join queries. Tests passed cleanly for me on this patch.

          The optimization of outer joins is interesting. In some cases, an outer join can be turned into an inner join and optimized as a single query block. However, in other cases, the tables in the outer join cannot be placed into a master join order with the other tables in the query. The following query behaves as an inner join because the compiler sees that the "t3.c1 = t2.c1" qualification will only be satisfied for rows in which t2 inner joined to t1 and therefore t2's columns are not null:

          a)
          select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1;

          However, the following query may qualify rows in which t2 outer joined to t1 and t2's columns are null. The following query cannot be turned into one big inner join:

          b)
          select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1;

          For query (b), the whole "(t1 left outer join t2 on t1.c1 = t2.c1)" clause is treated as a separate query block. The situation is actually even more complicated than that, and we'll get to that later. But at a high level, the optimizer starts out considering two join orders:

          [t3, QueryBock2]

          and

          [QueryBlock2, t3]

          While evaluating these join orders, the optimizer recurses and optimizes QueryBlock2. As you can see, QueryBlock2 will be optimized twice.

          This situation broke xml-based optimizer tracing in several ways:

          1) The xml tracer died on an NPE when trying to represent the join order and give a compact name to QueryBlock2. The fix was to give QueryBlock2 a generic name, viz., the class name of the query tree node which sits on top of QueryBlock2.

          2) Once that bug was fixed, it became apparent that there was no marker for the end of the recursive optimization of QueryBlock2. This caused trace information to be added to the wrong query block. The fix was to add some recursion to the xml tracer and to add a new trace method which flags the end of an attempt to optimize a query block.

          3) Once that bug was fixed, another variant of problem (1) surfaced: an NPE was raised trying to give QueryBlock2 a compact name in the query plan summary. Again, that problem was fixed by using the class name of the query tree node which sits on top of QueryBlock2.

          As promised above, the situation is actually a bit more complicated. Remember that QueryBlock2 represents the following clause:

          (t1 left outer join t2 on t1.c1 = t2.c1)

          The optimizer doesn't have many tricks for this outer join. It can't permute the join order. t1 must be outer to t2. So, the optimizer treats this clause as two separate query blocks and optimizes them separately. This happens because the HalfOuterJoinNode is really a TableOperatorNode, just like a UNION. Each branch of the HalfOuterJoinNode is treated as its own query block just as each branch of a UNION is treated as a separate, independent query block.

          I hope that this explanation makes it easier to interpret the results of the following script. You will notice two plans considered for query (b):

          ( ProjectRestrictNode * "APP"."SQL130807054111430" )

          ( "APP"."67bb80b4-0140-58cb-8920-00000383a238" * ProjectRestrictNode )

          Here's the script:

          connect 'jdbc:derby:memory:db;create=true';

          create table t1
          (
          c1 int,
          c2 int,
          c3 int,
          constraint cons1 primary key(c1, c2)
          );

          create table t2
          (
          c1 int not null,
          c2 int not null,
          c3 int,
          constraint cons2 unique(c1, c2)
          );

          create table t3
          (
          c1 int,
          c2 int,
          c3 int,
          constraint cons3 primary key(c1, c2)
          );

          select cast( t.tablename as varchar(2)), c.conglomeratename
          from sys.systables t, sys.sysconglomerates c
          where t.tableid = c.tableid
          and t.tabletype = 'T'
          order by t.tablename;

          --turn on xml-based optimizer tracing
          call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

          select * from t1 left outer join t2 on t1.c1 = t2.c1;

          select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1;

          select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1;

          – turn off optimizer tracing
          call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'optimizerTrace.xml' );

          – load the trace viewer
          call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'optimizerTrace.xml' );

          – view the costs of all complete plans
          select stmtID, qbID, estimatedCost, summary from planCost
          where complete
          order by stmtID, qbID, estimatedCost
          ;

          – unload the trace viewer
          call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

          Touches the following files:

          ---------------

          M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

          Added recursive treatment of query blocks and fixed the NPEs.

          ---------------

          M java/engine/org/apache/derby/iapi/sql/compile/OptimizerPlan.java

          Added a new kind of node in the OptimizerPlan: DeadEnd. This is a node for cases when the xml tracer can't peer inside an Optimizable. For instance, when the Optimizable is another query block.

          ---------------

          M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
          M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

          Changed the name of the traceStart() method to traceStartQueryBlock(). Added a new traceEndQueryBlock() method.

          ---------------

          M java/engine/org/apache/derby/impl/sql/compile/TableOperatorNode.java

          Adds a traceEndQueryBlock() call after a recursive optimization finishes.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-12-aa-traceEndOfQueryBlock.diff. This patch makes a couple improvements to the xml-based optimizer tracing of outer join queries. Tests passed cleanly for me on this patch. The optimization of outer joins is interesting. In some cases, an outer join can be turned into an inner join and optimized as a single query block. However, in other cases, the tables in the outer join cannot be placed into a master join order with the other tables in the query. The following query behaves as an inner join because the compiler sees that the "t3.c1 = t2.c1" qualification will only be satisfied for rows in which t2 inner joined to t1 and therefore t2's columns are not null: a) select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1; However, the following query may qualify rows in which t2 outer joined to t1 and t2's columns are null. The following query cannot be turned into one big inner join: b) select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1; For query (b), the whole "(t1 left outer join t2 on t1.c1 = t2.c1)" clause is treated as a separate query block. The situation is actually even more complicated than that, and we'll get to that later. But at a high level, the optimizer starts out considering two join orders: [t3, QueryBock2] and [QueryBlock2, t3] While evaluating these join orders, the optimizer recurses and optimizes QueryBlock2. As you can see, QueryBlock2 will be optimized twice. This situation broke xml-based optimizer tracing in several ways: 1) The xml tracer died on an NPE when trying to represent the join order and give a compact name to QueryBlock2. The fix was to give QueryBlock2 a generic name, viz., the class name of the query tree node which sits on top of QueryBlock2. 2) Once that bug was fixed, it became apparent that there was no marker for the end of the recursive optimization of QueryBlock2. This caused trace information to be added to the wrong query block. The fix was to add some recursion to the xml tracer and to add a new trace method which flags the end of an attempt to optimize a query block. 3) Once that bug was fixed, another variant of problem (1) surfaced: an NPE was raised trying to give QueryBlock2 a compact name in the query plan summary. Again, that problem was fixed by using the class name of the query tree node which sits on top of QueryBlock2. As promised above, the situation is actually a bit more complicated. Remember that QueryBlock2 represents the following clause: (t1 left outer join t2 on t1.c1 = t2.c1) The optimizer doesn't have many tricks for this outer join. It can't permute the join order. t1 must be outer to t2. So, the optimizer treats this clause as two separate query blocks and optimizes them separately. This happens because the HalfOuterJoinNode is really a TableOperatorNode, just like a UNION. Each branch of the HalfOuterJoinNode is treated as its own query block just as each branch of a UNION is treated as a separate, independent query block. I hope that this explanation makes it easier to interpret the results of the following script. You will notice two plans considered for query (b): ( ProjectRestrictNode * "APP"."SQL130807054111430" ) ( "APP"."67bb80b4-0140-58cb-8920-00000383a238" * ProjectRestrictNode ) Here's the script: connect 'jdbc:derby:memory:db;create=true'; create table t1 ( c1 int, c2 int, c3 int, constraint cons1 primary key(c1, c2) ); create table t2 ( c1 int not null, c2 int not null, c3 int, constraint cons2 unique(c1, c2) ); create table t3 ( c1 int, c2 int, c3 int, constraint cons3 primary key(c1, c2) ); select cast( t.tablename as varchar(2)), c.conglomeratename from sys.systables t, sys.sysconglomerates c where t.tableid = c.tableid and t.tabletype = 'T' order by t.tablename; --turn on xml-based optimizer tracing call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' ); select * from t1 left outer join t2 on t1.c1 = t2.c1; select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1; select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1; – turn off optimizer tracing call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'optimizerTrace.xml' ); – load the trace viewer call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'optimizerTrace.xml' ); – view the costs of all complete plans select stmtID, qbID, estimatedCost, summary from planCost where complete order by stmtID, qbID, estimatedCost ; – unload the trace viewer call syscs_util.syscs_register_tool( 'optimizerTracingViews', false ); Touches the following files: --------------- M java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java Added recursive treatment of query blocks and fixed the NPEs. --------------- M java/engine/org/apache/derby/iapi/sql/compile/OptimizerPlan.java Added a new kind of node in the OptimizerPlan: DeadEnd. This is a node for cases when the xml tracer can't peer inside an Optimizable. For instance, when the Optimizable is another query block. --------------- M java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java M java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java M java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java Changed the name of the traceStart() method to traceStartQueryBlock(). Added a new traceEndQueryBlock() method. --------------- M java/engine/org/apache/derby/impl/sql/compile/TableOperatorNode.java Adds a traceEndQueryBlock() call after a recursive optimization finishes.
          Hide
          ASF subversion and git services added a comment -

          Commit 1511315 from Rick Hillegas in branch 'code/trunk'
          [ https://svn.apache.org/r1511315 ]

          DERBY-6211: Trace the end of optimizing a query block so that the xml-based optimizer tracer can associate trace information with the correct query block; commits derby-6211-12-aa-traceEndOfQueryBlock.diff.

          Show
          ASF subversion and git services added a comment - Commit 1511315 from Rick Hillegas in branch 'code/trunk' [ https://svn.apache.org/r1511315 ] DERBY-6211 : Trace the end of optimizing a query block so that the xml-based optimizer tracer can associate trace information with the correct query block; commits derby-6211-12-aa-traceEndOfQueryBlock.diff.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-6211-13-aa-SelectNode_optimizer.diff. This patch moves the getOptimizer() method out of ResultSetNode down into SelectNode, the only subclass which actually calls it. The optimizer creation in SelectNode is balanced by a call to traceEndQueryBlock() just as was done to TableOperatorNode in the previous patch. I am running tests now.

          At this point, the SelectNode and TableOperatorNode are the only nodes which create optimizers. This helps us reason about where optimizable query blocks occur and ensure that every optimizer creation is balanced with a call to traceEndQueryBlock() after join optimization via that optimizer is finished.

          I would like to be able to hide or throw away the OptimizerImpl after join optimization is done. But that is not possible right now. Before explaining why, let me first sketch my current understanding of the phases of optimization:

          1) Preprocessing. This is the phase in which the query is rewritten. Rewriting tasks include subquery flattening and putting predicates into conjunctive normal form.

          2) Join optimization. Most of what we describe as optimization happens in this phase. It is the phase which selects join orders, join strategies, and access paths.

          3) Projection and restriction. This is the phase in which predicates are pushed down as close to the row sources as possible. And rows are pruned back to the minimal number of columns needed by higher operators in the plan. The workhorses for this phase are the modifyAccessPath() methods. There is some other miscellaneous, mechanical processing in this phase related to the CostEstimates which were calculated during join optimization.

          OptimizerImpl is responsible for phases (2) and (3). When a ResultSetNode is join optimized, the OptimizerImpl is saved away in ResultSetNode.optimizer so that it can be dug up later to perform projection and restriction. The OptimizerImpl used for join optimization retains information which is needed for projection and restriction.

          I don't like the fact that the OptimizerImpl is stashed away. That makes it hard to reason about when join optimization is done. I hope that the traceEndQueryBlock() calls now mark the end of join optimization attempts. I would prefer to save just the projection/restriction and CostEstimate variables rather than the whole OptimizerImpl. But that's a rototill I don't want to embark on now.

          Touches the following files:

          -------------

          M java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java
          M java/engine/org/apache/derby/impl/sql/compile/SelectNode.java

          Moves getOptimizer() down into SelectNode and adds a traceEndQueryBlock() call.

          -------------

          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/XMLOptimizerTraceTest.java

          Adds a test for xml-based optimizer tracing of outer joins.

          Show
          Rick Hillegas added a comment - Attaching derby-6211-13-aa-SelectNode_optimizer.diff. This patch moves the getOptimizer() method out of ResultSetNode down into SelectNode, the only subclass which actually calls it. The optimizer creation in SelectNode is balanced by a call to traceEndQueryBlock() just as was done to TableOperatorNode in the previous patch. I am running tests now. At this point, the SelectNode and TableOperatorNode are the only nodes which create optimizers. This helps us reason about where optimizable query blocks occur and ensure that every optimizer creation is balanced with a call to traceEndQueryBlock() after join optimization via that optimizer is finished. I would like to be able to hide or throw away the OptimizerImpl after join optimization is done. But that is not possible right now. Before explaining why, let me first sketch my current understanding of the phases of optimization: 1) Preprocessing. This is the phase in which the query is rewritten. Rewriting tasks include subquery flattening and putting predicates into conjunctive normal form. 2) Join optimization. Most of what we describe as optimization happens in this phase. It is the phase which selects join orders, join strategies, and access paths. 3) Projection and restriction. This is the phase in which predicates are pushed down as close to the row sources as possible. And rows are pruned back to the minimal number of columns needed by higher operators in the plan. The workhorses for this phase are the modifyAccessPath() methods. There is some other miscellaneous, mechanical processing in this phase related to the CostEstimates which were calculated during join optimization. OptimizerImpl is responsible for phases (2) and (3). When a ResultSetNode is join optimized, the OptimizerImpl is saved away in ResultSetNode.optimizer so that it can be dug up later to perform projection and restriction. The OptimizerImpl used for join optimization retains information which is needed for projection and restriction. I don't like the fact that the OptimizerImpl is stashed away. That makes it hard to reason about when join optimization is done. I hope that the traceEndQueryBlock() calls now mark the end of join optimization attempts. I would prefer to save just the projection/restriction and CostEstimate variables rather than the whole OptimizerImpl. But that's a rototill I don't want to embark on now. Touches the following files: ------------- M java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java M java/engine/org/apache/derby/impl/sql/compile/SelectNode.java Moves getOptimizer() down into SelectNode and adds a traceEndQueryBlock() call. ------------- M java/testing/org/apache/derbyTesting/functionTests/tests/lang/XMLOptimizerTraceTest.java Adds a test for xml-based optimizer tracing of outer joins.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me on derby-6211-13-aa-SelectNode_optimizer.diff.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me on derby-6211-13-aa-SelectNode_optimizer.diff.
          Hide
          ASF subversion and git services added a comment -

          Commit 1512294 from Rick Hillegas in branch 'code/trunk'
          [ https://svn.apache.org/r1512294 ]

          DERBY-6211: Mark the end of join optimization with a call to traceEndQueryBlock(); commit derby-6211-13-aa-SelectNode_optimizer.diff.

          Show
          ASF subversion and git services added a comment - Commit 1512294 from Rick Hillegas in branch 'code/trunk' [ https://svn.apache.org/r1512294 ] DERBY-6211 : Mark the end of join optimization with a call to traceEndQueryBlock(); commit derby-6211-13-aa-SelectNode_optimizer.diff.
          Hide
          Knut Anders Hatlen added a comment -

          Is there more work planned on this issue, or can it be resolved?

          Show
          Knut Anders Hatlen added a comment - Is there more work planned on this issue, or can it be resolved?
          Hide
          Rick Hillegas added a comment -

          Thanks for the nudge, Knut. Additional work in this area can happen on other JIRAs. Resolving.

          Show
          Rick Hillegas added a comment - Thanks for the nudge, Knut. Additional work in this area can happen on other JIRAs. Resolving.
          Hide
          Rick Hillegas added a comment -

          Optimizer tracing is now an optional tool. Enabling old-style optimizer tracing is described here: http://wiki.apache.org/db-derby/OptimizerTracing

          Enabling xml-style optimizer tracing is described here: http://wiki.apache.org/db-derby/XmlOptimizerTracing

          Show
          Rick Hillegas added a comment - Optimizer tracing is now an optional tool. Enabling old-style optimizer tracing is described here: http://wiki.apache.org/db-derby/OptimizerTracing Enabling xml-style optimizer tracing is described here: http://wiki.apache.org/db-derby/XmlOptimizerTracing

            People

            • Assignee:
              Rick Hillegas
              Reporter:
              Rick Hillegas
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development