|
Tars Joris made changes - 29/Oct/08 08:58 AM
Kathey Marsden made changes - 30/Oct/08 07:06 PM
I verified the report then dropped and recreated the index key3. The records ordered properly after that. Go figure...?? With the newly created cardinality statistics it may be the correct index is selected this time but the original results are still wrong.
ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE tab ER BY m0.value; ID |VALUE ---------------------------------------------------------------------------------- ------------------------------------------------------------------------------ 2147483653 |000002 2147483654 |000003 4294967297 |000001 3 rows selected ij> drop index key3; 0 rows inserted/updated/deleted ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; ID |VALUE ------------------------------------------------------------------------------------ ------------------------------------------------------------------------------ 4294967297 |000001 2147483653 |000002 2147483654 |000003 3 rows selected ij> CREATE INDEX key3 ON table2(value); 0 rows inserted/updated/deleted ij> SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; ID |VALUE ------------------------------------------------------------------------------------ ------------------------------------------------------------------------------ 4294967297 |000001 2147483653 |000002 2147483654 |000003 3 rows selected I ran the reporducible test case with couple different releases of Derby and noticed following
1)on trunk, if I quit out of the ij session which does the setup and runs the query with incorrect results, and then start a new session and run the query again it returns correct result. I didn't have to drop and recreate the index for it to work. The plan used in the 2 cases for the query are different 2)I found the same behavior in 10.2 codeline 3)With 10.1 codeline, I ran into npe when trying to run the setup script. Not sure what is going on there. I will enter a jira entry for the 10.1 behavior. The query plan when the incorrect results are returned is as follows
Statement Name: null Statement Text: SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 W HERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m 1.value='21857' ORDER BY m0.value Parse Time: 0 Bind Time: 0 Optimize Time: 0 Generate Time: 0 Compile Time: 0 Execute Time: 0 Begin Compilation Timestamp : null End Compilation Timestamp : null Begin Execution Timestamp : null End Execution Timestamp : null Statement Execution Plan Text: Project-Restrict ResultSet (10): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Source result set: Nested Loop Join ResultSet: Number of opens = 1 Rows seen from the left = 168 Rows seen from the right = 3 Rows filtered = 0 Rows returned = 3 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Left result set: Nested Loop Join ResultSet: Number of opens = 1 Rows seen from the left = 3 Rows seen from the right = 168 Rows filtered = 0 Rows returned = 168 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Left result set: Project-Restrict ResultSet (5): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = true projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Source result set: Index Row to Base Row ResultSet for TABLE2: Number of opens = 1 Rows seen = 3 Columns accessed from heap = {0, 1, 2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Index Scan ResultSet for TABLE2 using index KEY3 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 1 Rows seen = 3 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds)= 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row =0 scan information: Bit set of columns fetched=All Number of columns fetched=2 Number of deleted rows visited=0 Number of pages visited=2 Number of rows qualified=3 Number of rows visited=4 Scan type=btree Tree height=-1 start position: >= on first 1 column(s). Ordered null semantics on the following columns: stop position: > on first 1 column(s). Ordered null semantics on the following columns: qualifiers:None optimizer estimated row count: 0.00 optimizer estimated cost: 3.52 Right result set: Project-Restrict ResultSet (8): Number of opens = 3 Rows seen = 8688 Rows filtered = 8520 restriction = true projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 0.00 Source result set: Index Row to Base Row ResultSet for TABLE2: Number of opens = 3 Rows seen = 8688 Columns accessed from heap = {0, 1, 2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.00 optimizer estimated cost: 0.00 Index Scan ResultSet for TABLE2 using index KEY3 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 3 Rows seen = 8688 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds)= 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row =0 scan information: Bit set of columns fetched={1} Number of columns fetched=1 Number of deleted rows visited=0 Number of pages visited=12 Number of rows qualified=8688 Number of rows visited=8688 Scan type=btree Tree height=2 start position: None stop position: None qualifiers:None optimizer estimated row count: 0.00 optimizer estimated cost: 0.00 Right result set: Table Scan ResultSet for TABLE1 at read committed isolation level using instantaneous share row locking chosen by the optimizer Number of opens = 168 Rows seen = 3 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=1 Number of pages visited=1 Number of rows qualified=3 Number of rows visited=9408 Scan type=heap start position: null stop position: null qualifiers: Column[0][0] Id: 0 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false Column[0][1] Id: 0 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false optimizer estimated row count: 0.00 optimizer estimated cost: 0.00 The query plan when the correct results are returned is as follows
Statement Name: null Statement Text: SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 W HERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m 1.value='21857' ORDER BY m0.value Parse Time: 0 Bind Time: 0 Optimize Time: 0 Generate Time: 0 Compile Time: 0 Execute Time: 0 Begin Compilation Timestamp : null End Compilation Timestamp : null Begin Execution Timestamp : null End Execution Timestamp : null Statement Execution Plan Text: Sort ResultSet: Number of opens = 1 Rows input = 3 Rows returned = 3 Eliminate duplicates = false In sorted order = false Sort information: Number of rows input=3 Number of rows output=3 Sort type=internal constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 8.26 Source result set: Project-Restrict ResultSet (9): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 8.26 Source result set: Nested Loop Exists Join ResultSet: Number of opens = 1 Rows seen from the left = 3 Rows seen from the right = 3 Rows filtered = 0 Rows returned = 3 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 8.26 Left result set: Nested Loop Exists Join ResultSet: Number of opens = 1 Rows seen from the left = 3 Rows seen from the right = 3 Rows filtered = 0 Rows returned = 3 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 7.29 Left result set: Project-Restrict ResultSet (5): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = true projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 6.9 7 Source result set: Index Row to Base Row ResultSet for TABL E2: Number of opens = 1 Rows seen = 3 Columns accessed from heap = {0, 1, 2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 6.97 Index Scan ResultSet for TABLE2using index KEY3 at read committed isolation level using instantaneous share row locking chosen by the optimizer Number of opens = 1 Rows seen = 3 Rows filtered = 0 Fetch Size = 16 constructor time (millis econds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds ) = 0 next time in millisecond s/row = 0 scan information: Bit set of columns fetch ed=All Number of columns fetche d=2 Number of deleted rows v isited=0 Number of pages visited= 2 Number of rows qualified =3 Number of rows visited=4 Scan type=btree Tree height=2 start position: >= on first 1 column(s). Ordered null semantics on the following columns: stop position: > on first 1 column(s). Ordered null semantics on the following columns: qualifiers: None optimizer estimated row count: 0.20 optimizer estimated cost : 6.97 Right result set: Index Scan ResultSet for TABLE1 using constraint SQL090320113016460 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 3 Rows seen = 3 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched={0} Number of columns fetched=1 Number of deleted rows visited=0 Number of pages visited=3 Number of rows qualified=3 Number of rows visited=3 Scan type=btree Tree height=1 start position: >= on first 1 column(s). Ordered null semantics on the following columns: 0 stop position: > on first 1 column(s). Ordered null semantics on the following columns: 0 qualifiers: None optimizer estimated row count: 0.20 optimizer estimated cost: 0.31 Right result set: Index Row to Base Row ResultSet for TABLE2: Number of opens = 3 Rows seen = 3 Columns accessed from heap = {2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 0.97 Index Scan ResultSet for TABLE2 using constraint SQL090320113016670 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 3 Rows seen = 3 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=3 Number of deleted rows visited=0 Number of pages visited=6 Number of rows qualified=3 Number of rows visited=3 Scan type=btree Tree height=2 start position: >= on first 2 column(s). Ordered null semantics on the following columns:0 1 stop position: > on first 2 column(s). Ordered null semantics on the following columns:0 1 qualifiers:None optimizer estimated row count: 0.20 optimizer estimated cost: 0.97 to me looks like the problem comes with plan that is trying to do sort avoidance. The working plan has a sort node on the top.
But to me the "bad" plan looks like a valid sort avoidance plan. Seems to have index scans on the left side, using key3 which is sorted on the value that the query in the end wants to be sorted on. could it be that the leftmost join is actually on the m1.value part of the query rather than the m0.value of the query. maybe something got confused about the self join with 2 different "range" variables.
Mamta A. Satoor made changes - 20/Mar/09 09:17 PM
Once
I noticed that this issue also occurs on versions back to 10.1
Kathey Marsden made changes - 20/Mar/09 09:54 PM
My apologies for false alarm on 10.1 codeline. I noticed that I had some local changes made on that client. Once i reverted them, the npe went away. I will mark the new jira entry DERBY-4015 for npe as not a bug.
I will spend little time on this jira. As Mike pointed out, it is a possibilit that in the wrong result case, may be we are getting confused with m0 and m1 which basically are involved in a self join.
For a given query, if the optimizer chooses a query plan with an index key for say a where clause and it finds that the same key can be used to satisfy the order by, then in that case, optimizer will avoid doing the sorting required for order by because the rows are already in order based on the index key used to satisfy where clause. In this particular case, there is an index called key3 on table2.value which can be used to satisfy the m1.value='21857'. It is possible that may be there is some bug in optimizer which thinks that the index used for m1.value can be used to avoid the sorting required for m0.value. But the sort should not be avoided because the order by is on m0.value and not m1.value and hence index used for m1.value can't be used to avoid the sort on m0.value. I will debug this a little to see what is index key3 being used for ie m0 or m1. It is unclear from the query plan whether the index key3 is being used on m0 or m1. SELECT table1.id, m0.value, m1.value FROM table1, table2 m0, table2 m1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; I debugged to see which table is picked as the leftmost table since it is unclear from the query plan what we mean when we say TABLE2 is the leftmost table. There is a self-join involved with TABLE2 and we use aliases m0 and m1 for them but in the query plan, we talk about them as just TABLE2 and not along with their alias names. It appears that in both failing and passing case, m1 is the leftmost table in the query plan chosen for the queries. (I made one line change in my client to impl\sql\compile\IndexToBaseRowNode.java where rather than just sending the table name, I send both table name and alias name to the query plan collection logic. The code change is shown below
- mb.push(source.getBaseTableName()); + mb.push(source.getBaseTableName() + " " + source.getCorrelationName()); With the change above, I can see in the query plan for the failing and passing case that the leftmost resultset is for TABLE2 M1. I will go ahead and enter a jira entry to make query plan more readable by including the alias name along with the table name. I will next look at why we decide to skip the sorting in the failing case. That seems to be the only difference between the 2 query plans. Including the alias name in the query plan output seems like a great idea. Thanks for tracking that down!
Hi Mamta thanks for all the hard work on this.
I just wanted to comment that I think your theory that "may be there is some bug in optimizer which thinks that the index used for m1.value can be used to avoid the sorting required for m0.value" is a really good theory, and I think this is a good idea to explore in depth. I also agree that including the alias name would be great. Make sure to include a sample of what the query plan looks like after your change in the other jira.
I also was guessing that the optimizer got confused by the two aliases on the same table. It seems like you should find where in the optimizer we determine how the current query plan is sorted and we must compare that with how the query wants to be sorted to determine if the current plan is a valid "sort avoidance" plan. My guess is that at this point we somehow pass in that the plan is sorted a table2.value but somewhere we are not recognizing that we need to check for m0.value. but just a guess. It could also be a bug where we are passing in the wrong alias to the check. I spent some time in comparing the code path for the working case and non-working case in a debugger. What I found is in optimize phase, at one point, one of the join orders we try in non-working case is 2,0,1 and for the working case it is 2,0,-1.
0 stands for Table1 and 1 stands for Table2 m0, 2 stands for Table2 m1. -1 means that join position is not being used. I am not sure why we are picking different join orders. In addition to that, I see that for the 2,0,1 join order in non-working case, we indeed mark sort avoidance for Table2 m0, I definitely need to do more debugging to see why we think for this join order, we can avoid sorting for Table2 m0. I realize this is not much information but wanted to share what I have. Another interesting thing I noticed (in both working and non-working case) is we do recognize the optimize phase that sorting is required for m0 for the order by clause as shown belowHere, for the join order [2, 0, 1], we identify that sorting is required for M0.
Thread [main] (Suspended) OrderByList.sortRequired(RowOrdering, JBitSet) line: 549 Level2OptimizerImpl(OptimizerImpl).costBasedCostOptimizable(Optimizable, TableDescriptor, ConglomerateDescriptor, OptimizablePredicateList, CostEstimate) line: 2248 Level2OptimizerImpl(OptimizerImpl).costOptimizable(Optimizable, TableDescriptor, ConglomerateDescriptor, OptimizablePredicateList, CostEstimate) line: 1984 FromBaseTable.optimizeIt(Optimizer, OptimizablePredicateList, CostEstimate, RowOrdering) line: 521 ProjectRestrictNode.optimizeIt(Optimizer, OptimizablePredicateList, CostEstimate, RowOrdering) line: 316 Level2OptimizerImpl(OptimizerImpl).costPermutation() line: 1938 SelectNode.optimize(DataDictionary, PredicateList, double) line: 1767 CursorNode(DMLStatementNode).optimizeStatement() line: 305 CursorNode.optimizeStatement() line: 515 GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 367 GenericStatement.prepare(LanguageConnectionContext, boolean) line: 88 GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 802 EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 606 ij.executeImmediate(String) line: 329 utilMain.doCatch(String) line: 505 utilMain.runScriptGuts() line: 347 utilMain.go(LocalizedInput[], LocalizedOutput) line: 245 Main.go(LocalizedInput, LocalizedOutput) line: 210 Main.mainCore(String[], Main) line: 177 Main.main(String[]) line: 73 Main.main(String[]) line: 73 ij.main(String[]) line: 59 But it is obvious from the query plan for non-working that somehow we later decide to do sort avoidance for m0. I will look more to see where the optimizer changes it mind about the sort requirement for m0. For the incorrect case, the join order picked is 2,1,0 which m1, m0, table1 where as for the correct case, the join order picked is 2,0,1 which is m1,table1,m0. The other interesting thing in case of incorrect case is, for m0, for the plan used for m0, both the "optimizer estimated row count" and "optimizer estimated cost" are 0.00. Is it possible for the optimizer to be able to find a plan for a table such that the "optimizer estimated row count" and "optimizer estimated cost" can be 0? Maybe that is a valid situation, I don't know enough about the optimizer yet to answer that question but I though those statistics were worth bringing it up for discussion.
During query optimization phase, we need to know the estimated row count for the tables involved in the query. What I have found during debugging of the correct case and incorrect case is that the estimated row count is not the same in the 2 cases. When we finish the database session in which the tables, indexes are created, as part of closing the session, we must update the estimated row count and that would explain why the count is different for the same table. Of course, this change in estimated row count must be factoring in what plan gets picked up by the optimizer.
Next, I am planning on pursuing the broken case cost estimates during different join permutation because it seems like we are ending up with scenarios where the cost estimate shows the cost and row count to be 0 and hence we end up picking that plan because it is costing nothing to pickup that plan. Like I said in my earlier comment, I am not sure if it is ok to cost nothing for a plan and hence it ending up becoming the best plan and probably the best sort avoidance plan too. costEstimate cost 0.0 rowCount 0.0 I don't think tracking down the difference in estimated row count is going to lead you to the solution to this problem. For this issue it is key to figure out why the particular plan chosen in the bug case is wrong. It may be that the cost of the plan is wrong, but that should not affect whether any chose plan gives the right result - the cost should never affect that.
So again I believe the key thing here is to track down why this plan that has m1 as the outermost table thinks that this plan does not have to add a sort at the top to get proper ordering required by order by m0. My guess is that the bad plan uses the same key for m1 and m0 which are just ranges on the same table, but has incorrect logic to not see that it needs m0 using index key as outer most rather than just index key. Also don't get confused when the bug is fixed. My guess is that once the bug is fixed that the currently chosen plan will get marked correctly as NOT a sort avoidance plan which then will require an additional sort node which will add to the cost. This will probably make it such that the optimizer no longer picks this plan. Yes, even if the cost calculation for a plan is wrong, Derby should not return wrong resutls and hence for this jira entry, at this point, it might not be worth it to pursue the lead if the cost calculation is wrong.
I did find one consistently reproducible SQL which will cause the problem behavior whether we are in the same session where the tables/indexes were created or whether we start a fresh database session. So, once the database has been setup, one can open a new ij session and consistently repro the problem case with following optimizer overrides (this way, one does not have to setup the whole database in the same session as the origina problem SQL to repro the problem) SELECT table1.id, m0.value, m1.value FROM --DERBY-PROPERTIES joinOrder=FIXED table2 m1 -- DERBY-PROPERTIES index=key3 , table2 m0 -- DERBY-PROPERTIES index=key3 , table1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; So, the important thing is have both m1 and m0 use the index KEY3 which is on the column value on which ordering is happening for table m0. Now that I have a simple repro case (ie I don't have to go through countless iteration of optimizer for all different join orders and different predicate pulling down in different join orders), I can focus on the problem join order. I have done more debugging on the simpler query with optimizer overrides as shown below
SELECT table1.id, m0.value, m1.value FROM --DERBY-PROPERTIES joinOrder=FIXED table2 m1 -- DERBY-PROPERTIES index=key3 , table2 m0 -- DERBY-PROPERTIES index=key3 , table1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; For the query above, when the optimizer will start considering the join orders, it is going to assume table2 m1 is 0, table2 m0 is 1 and table1 is 3. So, when I say in my analysis below, that optimizer is working with say join order 0,1,-1, what I mean is optimizer is considering the join order m1,m0,-1. -1 means we are not taking into consideration any table for that position at this point. With that in mind, let me share what is happening and what part of it I don't understand. For the query above, optimizer first considers 0,-1,-1 and in OptimizerImpl.costBasedCostOptimizable line 2248 which is shown as below if (requiredRowOrdering.sortRequired(currentRowOrdering,assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED) The 2 params to the call above are set as follows At this point, currentRowOrdering RowOrderingImpl (id=62) alwaysOrderedOptimizables Vector<E> (id=92) elementCount 0 columnsAlwaysOrdered ColumnOrdering (id=94) columns Vector<E> (id=158) myDirection 3 tables Vector<E> (id=159) currentColumnOrdering null ordering Vector<E> (id=96) elementCount 0 unorderedOptimizables Vector<E> (id=97) elementCount 0 (ColumnsAlwaysOrdered in currentRowOrdering is set to Direction: 3 Table 0, Column 3 Table 0, Column 2) The 2nd param assignedTableMap just has {0} because we are only considering 0 in the join order. We return for the call above with true and that qualifies us to consider sort avoidance for m1. One of my question is should m1 get quailified to have sort avoidance set to true when we are not really ordering on any column that comes from m1? Moving on, once we finish with 0,-1,-1 where we decided that sorting can be avoided for m1, we move to the join order 0,1,-1. Again, we come to the code mentioned above for 0,1,-1 if (requiredRowOrdering.sortRequired(currentRowOrdering,assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED) If the earlier decision for considering sort avoidance true for m1 was correct(for join order 0,-1,-1), we should definitely return sort avoidance false for join order 0,1,-1 because with m1 as the outermost table, we can't avoid sorting on m0. The 2 params at this point in code are as follows for the call above The first param is currentRowOrdering RowOrderingImpl (id=62) alwaysOrderedOptimizables Vector<E> (id=99) elementCount 0 columnsAlwaysOrdered ColumnOrdering (id=156) columns Vector<E> (id=245) myDirection 3 tables Vector<E> (id=246) currentColumnOrdering ColumnOrdering (id=157) columns Vector<E> (id=252) myDirection 1 tables Vector<E> (id=255) ordering Vector<E> (id=103) elementCount 1 elementData Object[10] (id=259) modCount 3 unorderedOptimizables Vector<E> (id=104) elementCount 0 (ColumnsAlwaysOrdered in currentRowOrdering is set to Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2) (currentColumnOrdering in currentRowOrdering is set to Direction: 1 Table 1, Column 3) (ordering in currentRowOrdering is set to [Direction: 1 Table 1, Column 3]) The 2nd param assignedTableMap just has {0,1} because we are only considering 0 and 1 in the join order. Going further deep in 0,1,-1 join order, the call above leads to following line in impl.sql.compile.OrderByList.sortRequired(RowOrdering, JBitSet) line: 535 if ( ! rowOrdering.alwaysOrdered(cr.getTableNumber())) In this if block, we say that sorting is not required. In my mind, we should required sorting because the outermost table m1 is not going to be able to satisfy the ordering requirement for m0.value and hence we can't consider sort avoidance when m1 is outermost. But of course, code is deciding that sort avoidance is ok in this case. I will highly appreciate if someone can help me understand this piece of code which ends up making the decision for sorting required or not required. I think the code is mostly centered in org.apache.derby.impl.sql.compile.RowOrderingImple (protocol class org.apache.derby.iapi.sql.compile.RowOrdering). There is not much of javadoc at the class level which is making it hard me to grasp what this class is supposed to do and what are the different memebers in it. I am going to debug further on my own to try to understand it but any help from community will be appreciated. I talked to Army offline on this jira entry and following was his feedback on the issue.
He tends to agree that the outermost table m1 should not avoid sorting when the ordering required is on a column that is from an optimizable that is not in the join order yet. The join order is 0,-1,-1 when m1 decides that we can choose a plan which will avoid sorting. The code that makes that decision is in OrderByList.sortRequired(...), lines 496 - 533 Apparently, this code has been there since Cloudscape 2.0 days. **************Start of Army's analysis of the issue******************* The comment preceding the code segment mentioned above says: /* ** Check whether the table referred to is in the table map (if any). ** If it isn't, we may have an ordering that does not require ** sorting for the tables in a partial join order. Look for ** columns beyond this column to see whether a referenced table ** is found - if so, sorting is required (for example, in a ** case like ORDER BY S.A, T.B, S.C, sorting is required). */ The 2nd and 3rd line in comment confuses me. The only time I can think of where we could assume "sort avoidance" was okay when a table number is missing would be if the ORDER BY expression did not refer to any tables, ex. if it was a constant: ORDER BY 'some literal', m0.value But from what I can tell, we catch that during preprocessing and remove the constant, so the above code still wouldn't be useful. (Note: the above ORDER BY will actually cause the query to return the correct results because the presence of a non-column expression (esp. the literal) causes the optimizer to ALWAYS perform a sort--that's a workaround to the problem if the user really needs one...). If the code mentioned above was removed, then I think the optimizer would require sorting for the first optimizable (m1), and that would in turn mean that we have to sort for the entire join order--which should return the correct results. Of course, the thought of just removing code that's been in there for years is a bit scary...It would be nice to understand what the intended use case was, but the comments are not clear about that at all. **************End of Army's analysis of the issue******************* Based on the above feedback, I commented out the code from 496-533 lines in OrderByList and ran the junit tests and they all ran fine. The old harness had one test case failing (lang/wisconsin) and it failed because the query plans now include sorting when the original plans (without my code changes) did not include sorting. It appears that wisconsin test does not check the results of the cursors which are getting prepared. It just opens few cursors in order to get their query plans and dumps the query plans without checking the results of those queries. So I am not sure if the results from those queries have changed because of the additional sorting which is being added into their query plans. Ofcourse, the problem query shown below works fine with the code commenting suggested by Army SELECT table1.id, m0.value, m1.value FROM --DERBY-PROPERTIES joinOrder=FIXED table2 m1 -- DERBY-PROPERTIES index=key3 , table2 m0 -- DERBY-PROPERTIES index=key3 , table1 WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value; I would like to know what the community thinks of the code removal suggested by Army. Thanks Army for taking the time to look at this, and for pointing us to a good theory!
I like Army's suggestion. I wish I had a clearer grasp on the concept of a 'partial join order'. I think that this is an intermediate stage during optimization, where the optimizer has so far chosen an ordering for some, but not yet all, of the tables. It looks like the code is trying to handle the problem of how to make a sort avoidance check at this intermediate point in optimization. I think that all *chosen* query plans eventually reach a *complete* join order, but a query plan which is *discarded* due to being too expensive may never get beyond a *partial* join order. Is that true? It seems to me that there may be two topics here: 1) it would be valid to make such a check for the purposes of doing some preliminary costing of the join order up to this point, but for such an algorithm, the code would have to then re-visit the sort avoidance decision later in the optimization, once the partial join order had become a full join order. That is, I wonder if the overall flow is something like: - optimizer investigates a partial join order, decides to cost it, determines that (so far) a sort is not required. - optimizer later completes this join order, decides that it is acceptable, but does NOT re-analyze whether a sort is now required for the complete join order - optimizer then chooses the correct full join order, but incorrectly avoids the sort due to the decision it made when considering the partial join order. 2) The comment ("ORDER BY S.A, T.B, S.C") raises the interesting question of the situation in which each column, considered individually, is ordered properly, but because the ORDER BY clause interleaves columns from different tables, a sort is still required. That is, if S had an index on (A, C), and T had an index on B, we might look at ORDER BY S.A, T.B, S.C and think that no sorting of the results was required, because the join from S -> T would emit the rows in the correct order, but that is wrong; the interleaving of the columns means that the sort must still be performed. And, presumably, there is the interesting situation where we need to cost out a query at a point where we have determined a partial join order that contains a position for S, but not T, or vice versa. It would be interesting to know more about the Wisconsin test cases, about the queries involved, and about the before- and after- query plan differences, with respect to sorting. Are *all* the changes due to situations where we formerly avoided a sort, but now we choose one? That's interesting, I think; we want to be careful to avoid introducing un-necessary sorts because that could be a substantial performance regression (of course, if the old query was returning the rows in the wrong order, but the test didn't check, and the new query is now being performed correctly, that's important to know, too!) It would also be interesting to see if we can construct an example along the lines of the ORDER BY S.A, T.B, S.C case, such that various other permutations (ORDER BY S.A, S.C, T.B or ORDER BY T.B, S.A, S.C) did not require sorting, but ORDER BY S.A, T.B, S.C did, and see how the query plans emitted for these various cases behaved. Bryan, I haven't read your comment completely but I noticed " I wish I had a clearer grasp on the concept of a 'partial join order. I think that this is an intermediate stage during optimization, where the optimizer has so far chosen an ordering for some, but not yet all, of the tables. It looks like the code is trying to handle the problem of how to make a sort avoidance check at this intermediate point in optimization." Your understanding of partial join order is correct. For our specific query case, the optimizer is eventually going to have a join order consisting of 3 tables once it determines the best join order. But to arrive to that join order, it goes through many iterations of possible join orders. It starts out by considering only one optimizable in the first position of the join order, eg, 0,-1,-1 and this will be considered partial join order.
Bryan, I think you are right that there might be some partial join orders which will never reach complete join order stage because the partial join order is turning out to be too expensive.
For what it's worth, I agree with everything Bryan wrote in his April 16th comment :)
> I think that all *chosen* query plans eventually reach a *complete* join order, but a query > plan which is *discarded* due to being too expensive may never get beyond a *partial* > join order. Is that true? Yes, that's true. > The comment ("ORDER BY S.A, T.B, S.C") raises the interesting question of the > situation in which each column, considered individually, is ordered properly, but > because the ORDER BY clause interleaves columns from different tables, a sort is > still required. Yes, it does. And as I re-think about this, perhaps the code was written for a situation like "ORDER BY S.A, T.B" -- Note that we do *NOT* have "S.C". In that case a partial join order with "S" would satisfy the first order by column, and the second order by colum, "T.B", would have a table that is not in the join order. Without the logic in question, I think the method would determine that a sort was required because table "T" wasn't found. But the logic in question would see if there was anything 'after' T.B, and since there isn't, it would say that the partial join order can avoid a sort *so far*, with the assumption that if the next optimizable to be placed in the join order is "T", we might be able to avoid the sort entirely. As soon as "S.C" gets added to the list, though, the logic sees that we have interleaving columns and therefore correctly requires a sort. So *if* that's a correct statement of how the code is *supposed* to work, then it is actually quite useful and it does make sense. But there seems to be a glitch in the logic--namely, it should perhaps require that a) all of the tables for the LEADING SET of order by columns, up to the one whose table cannot be found, MUST exist within the join order, *and* b) the leading set of order by columns canNOT be empty. I think the code as written checks for "a", but it does not check for "b". So in the query for this issue, we have "ORDER BY m0.value". When we get a partial join order with { m1 } in it, we check the order by column "m0.value" and find that "m0" is not in the (partial) join order. Today, due to the lack of condition "b", we think we can avoid the sort. But if condition "b" was in place, we would see that the "leading set" of order by columns--i.e. the number of order by columns before "m0.value", is EMPTY, which means that "so far" nothing is sorted, and thus the sort would be required. I haven't actually tried that out, I'm just writing as things occur to me, so this could be incomplete and/or entirely incorrect... > if condition "b" was in place, we would see that the "leading set" of order by
> columns--i.e. the number of order by columns before "m0.value", is EMPTY, > which means that "so far" nothing is sorted, and thus the sort would be required. One easy way to check this theory would be to make the following change: @@ -503,7 +503,7 @@ */ if (tableMap != null) { - if ( ! tableMap.get(cr.getTableNumber())) + if ((position > 0) && !tableMap.get(cr.getTableNumber())) { /* Table not in partial join order */ for (int remainingPosition = loc + 1; and see what happens... > *and* b) the leading set of order by columns canNOT be empty.
Okay, check that, that's not a complete solution. Consider ORDER BY S.A, T.B with a join order of { S, W, - }. If we can avoid sorting for S.A and then we process "T.B" and say that we can avoid the sort there, as well, we'll get into trouble. Once we get to join order { S, W, T } we might find that our access plan for T can avoid the sort for B...but due to the presence of "W" in the join order, I think the results would still end up out of order. So the condition would have to be generalized more than what I posted earlier...I think... > due to the presence of "W" in the join order, I think the results would still end up out of order.
I'm not seeing that, seems like the data from W is irrelevant to the final ordering, but I think it's good to be cautious and verify all of this with some actual tables and some actual queries. Mamta, you might want to mark this issue as assigned to you...? I have worked on writing a junit test which is currently going to fail because we are returning the data in incorrect order. I thought it would be useful to have the test for people to quickly run the test if they wanted to. This junit test will not be part of any suite currently since the bug is not fixed yet. Putting it in the suite is going to make it fail everytime because the test is asserting that the data be returned in the correct order. I will check that test in soon. It was painful to convert the setup script provided for this jira into a junit test. The script is huge. I was able to use Army's test converter
I am working on the wisconsin test so I can print out the test results and see if the code changes suggested by Army affects the data output in anyways.
Hi Mamta, I noticed the grant to ASF checkbox was not marked with the attachment derby-reproduce.zip. Do you think it is ok to turn it into a test? This may be fine since it is just SQL and not java code, but I wanted to just check and make sure.
Thanks for catching that, Kathey. I don't know the answer whether it is ok to create a SQL test out of an attachement which is not marked grant to ASF. Maybe someone else has a definitive answer.
Tars Joris, do you think you can grant the derby-reproduce.zip to ASF? I will find out if the test code can be granted to ASF.
Mamta A. Satoor made changes - 23/Apr/09 04:46 PM
wisconsin test was showing diffs after I commented out the code in OrderByList (code through 504-533). The diffs were for 7 queries and for those 7 queries, now the plan picked does a sorting (prior to my changes, the sorting was getting avoided). wisconsin test only does query plan dump, it does not check the actual data returned for those queries. For the 7 queries that changed, I added a check to dump the data returned from those queries. Rerunning wisconsin with and without my code changes atleast confirms that the data returned because of the additional sorting node has not been affected. All of these 7 queries involved multiple tables in the FROM list and they had ORDER BY clause. (I have included the 7 queries below for reference).
I went through all the queries in wisconsin test and see that there are still quite a few queries (even the ones with more than one table in the FROM list and have ORDER BY) that do not have sorting node on top of their query plan because of the commenting of the code. One of such query example is get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 and TENKTUP2.unique1 < 6000 order by TENKTUP1.unique1'; BTW, the 7 queries that have changed their plans so that they now require sorting are as follows(Note that the comment for the query 2 "says that sort avoidance with joins and order by on columns in different tables". Well, with other change in the code, we are not avoiding sort anymore) 1) ij> -- one row from joining table get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique2 = TENKTUP2.unique2 and TENKTUP2.unique1 = 0 order by TENKTUP1.unique1'; 2) ij> -- Sort avoidance with joins and order by on columns in different tables -- -- order on joining columns get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 order by TENKTUP1.unique1, TENKTUP2.unique1'; 3) ij> get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 and TENKTUP1.unique2 = 0 and TENKTUP2.unique2 = 0 order by TENKTUP1.unique1, TENKTUP2.unique1'; 4) ij> get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 and TENKTUP1.unique2 < 6000 and TENKTUP2.unique2 = 0 order by TENKTUP1.unique1, TENKTUP2.unique1'; 5) ij> get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 and TENKTUP1.unique2 < 6000 and TENKTUP2.unique2 = 0 order by TENKTUP1.unique2, TENKTUP2.unique2'; 6) ij> get cursor c as 'select * from TENKTUP1, TENKTUP2, ONEKTUP where TENKTUP1.unique1 = TENKTUP2.unique1 and ONEKTUP.unique1 = TENKTUP1.unique1 and TENKTUP1.unique2 = 0 and TENKTUP2.unique2 = 0 order by TENKTUP1.unique1, TENKTUP2.unique1'; 7) ij> get cursor c as 'select * from TENKTUP1, TENKTUP2, ONEKTUP where TENKTUP1.unique2 = TENKTUP2.unique2 and ONEKTUP.unique2 = TENKTUP1.unique2 and TENKTUP1.unique2 = 0 and TENKTUP2.unique2 = 0 order by TENKTUP1.unique1, TENKTUP2.unique1'; Thanks Mamta!
Could you perhaps include the join order that was chosen for each of the above queries? I'm not looking for the full query plan, just something short like the { TENKTUP1, TENKTUP2 } notation. You should be able to get the join order by reading the query plan top to bottom; the order in which you see the table names should reflect the join order chosen by the optimizer. Note that the order by clauses for queries #2 thru #7 all match the "ORDER BY S.A, T.B" shape that I mentioned in my first April 16th comment, so if the theory as to how that code is supposed to work was correct, that *might* explain why these queries fail to avoid the sort when the relevant code is commented out...? Of course, more tracing/debugging of the individual queries would be necessary to know for sure... I took a close look at the original query plans for queries 1 through 3, and believe all of the original
query plans with no sort at the top are valid plans. So think the current fix may not be valid as it is causing sort where they do not need to be. I am new to this, so if my logic is wrong please point it out. For query 1: 1) ij> -- one row from joining table get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique2 = TENKTUP2.unique2 and TENKTUP2.unique1 = 0 order by TENKTUP1.unique1'; Because TENKTUP2.unique1 is a unique column and the query requires TENKTUP2.unique1 = 0, only 1 or 0 rows qualifies from TENKTUP2. Because TENKTUP2.unique2 and TENKTUP1.unique2 are also both unique then TENKTUP1.unique2 = TENKTUP2.unique2 means only 1 or 0 rows qualify qualify from TENKTUP1. This means the query can only return 1 or 0 rows. Because this is the case there is no need to sort a 1 or 0 row result set. My question is, is it expected that the optimizer should recognize that a one row result set requires no ordering? It looks like queries 3 through 7 are all variations on one row result sets. Query 2 is different: get cursor c as 'select * from TENKTUP1, TENKTUP2 where TENKTUP1.unique1 = TENKTUP2.unique1 order by TENKTUP1.unique1, TENKTUP2.unique1'; In this case the interesting thing is that TENKTUP1.unique1 must be the same as TENKTUP2.unique1. So this means that any plan that is sorted on TENKTUP1.unique1 OR TENKTUP2.unique1 fulfills the order by requirement to be sorted by TENKTUP1.unique1, TENKTUP2.unique1. So in the original queryplan it turns out the outermost index scan is: TENKTUP1 using index TK1UNIQUE1 which because of the predicate is enough to avoid sorting. Again the question is whether the code is meant to catch this case or did it get lucky? Originally I have to admit I didn't understand how any multiple table order by could be a sort avoidance plan since we have no indexes that cover multiple tables. I believe the above show some of the cases. Another case I have seen is an order by of A.key, B.key being satisfied by join on A.key, B.key when it is known A.key is a single value. Or satisfied also by join on B.key, A.key where it is also known that A is a single value like A.key = ?. I haven't looked at the details of your previous post, but regarding:
> So think the current fix may not be valid as it is causing sort where they do not need to be. I agree. That's what I was hoping to convey in my comment on April 16th when I wrote: > *if* that's a correct statement of how the code is *supposed* to work, then it is actually > quite useful and it does make sense. I.e. we shouldn't remove it. As for: > is it expected that the optimizer should recognize that a one row result set requires > no ordering? I don't know the details of how it works, but there is definitely logic in the optimizer to try to recognize one row result sets and to make decisions based on that information. The RowOrderingImpl.java class even includes a field called "alwaysOrderedOptimizables" for which the javadoc says: /* ** This vector contains table numbers for tables that are always ordered. ** This happens for one-row tables. */ Not sure how that plays into the above queries, but I thought I'd mention it... > Another case I have seen is an order by of A.key, B.key being satisfied by join on > A.key, B.key when it is known A.key is a single value Yes, I think that's what the "columnsAlwaysOrdered" field in RowOrderingImpl seems to be for. Any columns which are compared to constants (and probably parameter markers as well?) are considered "always ordered" and hence should not require sorting in and of themselves. Ex. In the query for this issue the columnsAlwaysOrdered field includes columns m0.name, m1.name, and m1.value because all of those columns are compared to literals in the WHERE clause. It seems quite possible that the optimizer will apply these rules transitively to detect other "always ordered" columns as well... Test script that can be included in the Test-Suite.
Some data was changed, but I verified that it still reproduced the bug.
Tars Joris made changes - 27/Apr/09 07:22 AM
I have changed the test to use the test script submitted by Tars this morning. Thanks for the updated script, Tars Joris.
I modified the last script provided by Tars to create a table identical to table2 and the new table is called table3. The original query for Tars test case which returns incorrect order of rows used a self join on table2 along with table1. To avoid any possibilities of problems associated with may be self join, I created this table3 which is identical to table2. But despite that, the query is still returning incorrect results. Here is how the new query (without the self join) looks like.
SELECT table1.id, table2.value, table3.value FROM table1, table2, table3 WHERE table1.id=table2.id AND table2.name='PageSequenceId' AND table1.id=table3.id AND table3.name='PostComponentId' AND table3.value='21857' ORDER BY table2.value; The database can be setup to run the query above is enclosed as script3.sql to this jira entry.
Mamta A. Satoor made changes - 28/Apr/09 09:49 PM
I am attaching a newer version of script3.sql The new script is called script3WithUserFriendlyIndexNames.sql The only changes made to script3.sql are to use user-friendly index names so that it is easier to understand the query which is using index names through optimizer overrides to demonstrate the buggy behavior. Hopefully it will be easier to read the query plan as well. The query to see the problem is as follows
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 , table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2 , table1 WHERE table1.id=table2.id AND table2.name='PageSequenceId' AND table1.id=table3.id AND table3.name='PostComponentId' AND table3.value='21857' ORDER BY table2.value; The query plan for the query above looks as follows Statement Name: null Statement Text: SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 , table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2 , table1 WHERE table1.id=table2.id AND table2.name='PageSequenceId' AND table1.id=table3.id AND table3.name='PostComponentId' AND table3.value='21857' ORDER BY table2.value Parse Time: 0 Bind Time: 0 Optimize Time: 0 Generate Time: 0 Compile Time: 0 Execute Time: 0 Begin Compilation Timestamp : null End Compilation Timestamp : null Begin Execution Timestamp : null End Execution Timestamp : null Statement Execution Plan Text: Project-Restrict ResultSet (10): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 1185.66 Source result set: User supplied optimizer overrides for join are { joinOrder=FIXED } Nested Loop Exists Join ResultSet: Number of opens = 1 Rows seen from the left = 3 Rows seen from the right = 3 Rows filtered = 0 Rows returned = 3 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 1185.66 Left result set: User supplied optimizer overrides for join are { joinOrder=FIXED } Nested Loop Join ResultSet: Number of opens = 1 Rows seen from the left = 3 Rows seen from the right = 3 Rows filtered = 0 Rows returned = 3 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 1185.35 Left result set: Project-Restrict ResultSet (5): Number of opens = 1 Rows seen = 3 Rows filtered = 0 restriction = true projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 6.97 Source result set: Index Row to Base Row ResultSet for TABLE3: Number of opens = 1 Rows seen = 3 Columns accessed from heap = {0, 1, 2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 6.97 User supplied optimizer overrides on TABLE3 are { index=NONUNIQUEONVALUE_TABLE3 } Index Scan ResultSet for TABLE3 using index NONUNIQUEONVALUE_TABLE3 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 1 Rows seen = 3 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=2 Number of deleted rows visited=0 Number of pages visited=2 Number of rows qualified=3 Number of rows visited=4 Scan type=btree Tree height=-1 start position: >= on first 1 column(s). Ordered null semantics on the following columns: stop position: > on first 1 column(s). Ordered null semantics on the following columns: qualifiers:None optimizer estimated row count: 0.20 optimizer estimated cost: 6.97 Right result set: Project-Restrict ResultSet (8): Number of opens = 3 Rows seen = 8688 Rows filtered = 8685 restriction = true projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 1178.38 Source result set: Index Row to Base Row ResultSet for TABLE2: Number of opens = 3 Rows seen = 8688 Columns accessed from heap = {0, 1, 2} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 0.20 optimizer estimated cost: 1178.38 User supplied optimizer overrides on TABLE2 are { index=NONUNIQUEONVALUE_TABLE2 } Index Scan ResultSet for TABLE2 using index NONUNIQUEONVALUE_TABLE2 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 3 Rows seen = 8688 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds)= 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row =0 scan information: Bit set of columns fetched={1} Number of columns fetched=1 Number of deleted rows visited=0 Number of pages visited=12 Number of rows qualified=8688 Number of rows visited=8688 Scan type=btree Tree height=2 start position: None stop position: None qualifiers:None optimizer estimated row count: 0.20 optimizer estimated cost: 1178.38 Right result set: Index Scan ResultSet for TABLE1 using constraint SQL090429102526750 at read committed isolation level using share row locking chosen by the optimizer Number of opens = 3 Rows seen = 3 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched={0} Number of columns fetched=1 Number of deleted rows visited=0 Number of pages visited=3 Number of rows qualified=3 Number of rows visited=3 Scan type=btree Tree height=1 start position: >= on first 1 column(s). Ordered null semantics on the following columns:0 stop position: > on first 1 column(s). Ordered null semantics on the following columns:0 qualifiers: Column[0][0] Id: 0 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false optimizer estimated row count: 0.20 optimizer estimated cost: 0.31
Mamta A. Satoor made changes - 29/Apr/09 06:12 PM
I went through the optimize phase through the debugger and it appears to me (I may be wrong and would appreciate others looking at my detail analysis of the optimize phase below) that the problem may be with the generate phase or the execute phase where we may be not using the non-unique index on table2 correctly to fetch the orders in row.
The query in question is as below SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 , table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2 , table1 WHERE table1.id=table2.id AND table2.name='PageSequenceId' AND table1.id=table3.id AND table3.name='PostComponentId' AND table3.value='21857' ORDER BY table2.value; For the query above, in addition to the predicates supplied by the user, optimizer internally generates another predicate, namely, table3.id=table2.id So for the queyr, all the predicates are as follows 1)table1.id=table2.id 2)table1.id=table3.id 3)table3.id=table2.id 4)table2.name='PageSequenceId' 5)table3.name='PostComponentId' 6)table3.value='21857' Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding optimizables ie 4) will be associated with table2 and 5),6) will be associated with table3. This is because these predicates are constant comparison with columns. This leaves us with 3 predicates, namely 1), 2), 3) which are multitable join predicates. Optimizer has a class called RowOrdering associated with it (OptimizerImpl.currentRowOrdering). currentRowOrdering has following fields in it. currentRowOrdering RowOrderingImpl alwaysOrderedOptimizables Vector<E> columnsAlwaysOrdered ColumnOrdering currentColumnOrdering null ordering Vector<E> unorderedOptimizables Vector<E> All the predicates that are constant comparison will go into columnsAlwaysOrdered. These pushing of constant comparison predicates happen per optimizable basis when that particular optimizable is being consdiered in the possible join order combination. In our specific query, through optimizer overrides, we have instructed optimizer to only consider join order [table3, table2, table1]. The optimizer starts with [table3, -1, -1]. First thing it does is it goes through the join predicates (which are 1), 2) and 3) in the predicate list above). But since all the referenced tables for any of the 3 predicates are not covered by the current join order of [table3, -1, -1], nothing gets done to those join predicates. Next, the optimizer will tell currentRowOrdering to (this happens in FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering, OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above list into it's columnsAlwaysOrdered list. So, at the end of the [table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows Direction: 3 Table 0, Column 3 Table 0, Column 2 We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. So far, the logic for currentRowOrdering seems to be working fine. Next, we have asked the optimizer to use the index index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6) since that predicate is on the same column on which the index is created but it does not cover the other columns from table3 that are being referenced in this query (which table3.id and table3.name). Because of this, we determine that the index being considered is not a covering index. The code to determine whether the sorting can be avoided for [table3, -1, -1], is in OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on table2.value, the order by column's table does not match with table3 and hecne we determine that sorting is not required based on what optimizer has seen so far. So it appears that we leave it to table2 when its turn comes in the join order to decide whether sorting should indeed be avoided or not. Next we consider the join order [table3, table2, -1]. For table2, we have asked the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is go through the join predicates 1), 2) and 3). Predicate number 3) which is TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the current join order [table3, table2, -1] includes the tables referenced by predicate 3). So, at this point, there are 2 predicates pushed down to table3, they are number 5) and 6). And for table2, there are 2 prdicates pushed down to it, they are number 3) and 4). Also, since predicate 4) is a constant comparison, it will get added to currentRowOrdering. At this point, currentRowOrdering.columnsAlwaysOrdered will look as folows Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2 We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value) and column 2(name) which are always ordered because they are being compared with constants. In addition, Table at position 1(which is Table2 in our join order) has column 2 which is always ordered because it is being compared with constant. Next, we have asked the optimizer to use the index nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate 4) since that predicate is on column name and not value. Notice, this is a different code path we are following for table2 compared to table3 above. Because table3.value is not already an ordered column in currentRowOrdering because there is no constant comparison predicate on it, we add it to the "ordering " vector in currentOrdering object. This is the first object that gets added to the currentRowOrdering."ordering" vector in our eg. So, at this point, the currentRowOrdering has only 3 of it's fields propulated and they are as follows columnsAlwaysOrdered ColumnOrdering Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2 currentColumnOrdering ColumnOrdering Direction: 1 Table 1, Column 3 ordering Vector<E> [Direction: 1 Table 1, Column 3] The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it does not cover all the column from table2 that are being referenced in this query and hence it is not a covering index. Next, the code to determine whether sort can be avoided for join order [table3, table2, -1], we go through the code path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the order by column's table matches with table2 in join order. Because of this match, we need to look at currentRowOrdering to see if it will take care of the sorting and if so we can avoid the sort. To look into currentRowOrdering, we first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call, cr is the order by column). So, we are checking if table2 is always ordered in currentRowOrdering. Since table2 is not always ordered in this query, this check returns false. Next, we check if not the entire table, is the order by table.order by column combination always ordered in currentRowOrdering. In our query, that will be table2.value Since there is no constant comparison predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector in currentRowOrdering. For reference, currentRowOrdering looks as follws columnsAlwaysOrdered ColumnOrdering Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2 currentColumnOrdering ColumnOrdering Direction: 1 Table 1, Column 3 ordering Vector<E> [Direction: 1 Table 1, Column 3] As we can see from currentRowOrdering object above, columnsAlwaysOrdered does not include Table 1, Column 3. So, we have not found table2 to be always ordered and we have not found table2.value to be always ordered either. The last place to check is the ordering vector in columnsAlwaysOrdered. This vector does include Table 1, Column 3 which is table2.value and hence we determine that sorting is not needed to table2. All this code of checking the columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet). Assuming that this code is working as intended, I think then the culprit might be when we generate the code. The only step in optimize left is to add table1 to the join order. So, at the end of the optimize phase, the join order will look as follows [table3, table2, table1] and currentRowOrdering looks as follows columnsAlwaysOrdered ColumnOrdering Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2 currentColumnOrdering ColumnOrdering Direction: 1 Table 2, Column 1 ordering Vector<E> [Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1] The only change to currentRowOrdering that is caused by adding of table1 in third join order position is that we are going to use primary key on table1 and hence we need to reflect that in currentRowOrdering by adding it to the ordering vector. Let me know if I am understanding what is going on.
To me what looks like is happening is that the optimizer is looking at each of the join nodes and says the following: table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 : always ordered because we only get rows where table3.value = 21857 but note that it is a non-unique key so multiple rows can come back. table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2 here it see's that an index scan on table2 will return keys in table2.valu order. Now it seems like the optimizer is saying that since table3 is always ordered that it can then consider the result set to be solely ordered on table2.value. This would be a correct assumption if table3 returned a single row, but not correct when it returns multiple rows. In the bug case it looks to me like table3 returns 3 rows where table3.value = 21857. table3.id | table3.value ---------------------------------------------- 2147483653 |21857 2147483654 |21857 4294967297 |21857 Now the next part of the join has been forced to use table2.nonUniqueOnValue_Table2, which is not useful other than providing sorted access, so it turns into a table scan. The key is that it turns into 3 table scans, searching for table3.id=table2.id leading to: ID |table2.VALUE |VALUE ---------------------------------------------- 2147483653 |000002 |21857 2147483654 |000003 |21857 4294967297 |000001 |21857 The code does THREE full scan using the index and thus the rows are not ordered by table2.value. So even though we are using a key that should give us the correct sorted order on table2.value, we are traversing it more than once so it does not matter. If the probes had been done using a single scan with something like where (id = 2147483653 or 2147483654 or 4294967297) it would have been sorted correctly, but that is not how joins work. I believe the execution code is doing the right thing. It seems to me that the optimizer code is incorrectly using the "always sorted information" incorrectly. Mike, thanks for your time on this jira. Yes, your understanding of my description is correct. I think the key here is that the outer table(table3) is returning more than one row and each one of those row is requiring us to look at the middle table (table2) which results into 3 scans on table2. So even though, table2.value has index on it, it is not helping in this case because of 3 different scans on table2. If it were just one scan on table2, then the index table2.value would have returned the rows to us in proper order. The 3 different scans would require a sorting on them to return the rows in sorted order.
Attaching another SQL file, d3926_repro.sql, which reproduces the problem (for me) with a simpler set of tables and data.
This repro was motivated by the observation made by Mike and re-iterated by Mamta, namely: > the key here is that the outer table(table3) is returning more than one row and > each one of those row is requiring us to look at the middle table (table2) which > results into 3 scans on table2 Put differently, the outer table is the one which is "driving" the iteration because a) each row from the outer table leads to a scan on the middle table, and b) the optimizer determines that no sort is necessary. Thus the order of the result is based solely on the order of the rows that are retrieved from the outer table. Some observations about what was necessary to get this particular repro to work: 1) The outer table (T1) has a non-unique index on column I1, and we force the optimizer to use that non-unique index. Why? If the optimizer were to use a table scan for T1 then we would check to see if T1 was a "oneRowResultSet", which it isn't (and can't be, since we need T1 to return multiple rows in order to satisfy observation 3 below). Since it's not a one-row result set T1 would then get added to the list of "unordered optimizables" for the join order--and if that list has at least one optimizable in it, we would end up doing an explicit sort and thus the problem would not repro. So the plan for T1 must use an index. 2) There is a predicate in the WHERE clause which compares the non-unique indexed column T1.I1 to a CONSTANT expression. Why? If such a predicate did not exist then T1.I1 would be added as the first column in the "rowOrdering" for the join order. Then when the optimizer adds the middle table (T2) to the join order, it would see that the index for T2 does *not* satisfy the ordering requirement of T1.I1, which means we would end up doing an explicit sort for the whole plan. So the problem would not repro. By adding a predicate to compare T1.I1 with a constant, we effectively make T1.I1 "always ordered" and so we do not need to add it to the row ordering. 3) The outer table (T1) has multiple rows which have the same value for the indexed column T1.I1. Why? The presence of multiple rows is important because that's what leads to multiple scans on the middle table (as pointed out by Mike and Mamta). So we need to have a predicate which compares to a fixed constant value (observation 2), but we also need that predicate to return multiple rows. Thus there must be multiple rows in T1 which have I1 column values that equal the constant value used in the predicate. (This is why the index for T1 must be non-unique.) 4) The middle table (T2) has an index that is ordered the same way as the ORDER BY clause. We force the optimizer to use that index for T2. Why? If the optimizer is using an index that satisfies the ordering requirement for the query, it will try to avoid sorting the resultant rows. Sort avoidance is key to reproducing the reported behavior--esp. the optimizer *thinks* it can avoid the sort, and does so, but in truth it should *not* have done so. 5) The index for T2 is not covering--and esp. it does *not* include the column T2.I2 that is used for joining with the outer table. Why? The fact that the index is non-covering means that we will have to go to the T2 table conglomerate to fetch the row that has the current join value--and access to the table conglomerate is _unordered_. 6) The column from T1 (outer table) that is joined with T2 (middle table) has varied values for each of the rows. Why? The presence of different values in T1.J1 means that we will scan T2's table conglomerate multiple times for different T2.I2 values, and since the table conglomerate is unordered (observation 5), those multiple scans will return the rows in an order that does *not* match the index order. 7) The rows that are inserted into T2 are inserted in an order that does NOT match the ORDER BY ordering. Why? It appears that, when inserting rows into a table, the order of the rows in the base table conglomerate generally matches the insertion order. I don't think there are any guarantees of that, but that's what I observed for this simple data. So if we were to insert the rows in proper order, access to the base table conglomerate (observations 5 & 6) might in fact return the rows in the desired order by accident, which would hide the problem that we're trying to reproduce. With all of those observations in place, I was able to write the attached script to reproduce the problem for me. The results I see when I run are: J1 |J2 |J3 --------------------------- 0 |f |five 1 |g |six 2 |e |four but the query specifies "ORDER BY t2.j2", so the rows are in the wrong order. It would be great if others could try to run the script to make sure they see the same behavior (if not, then some or all of this comment may in fact be wrong or incomplete...). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
output.txt
script.sql
sysinfo.txt