Issue Details (XML | Word | Printable)

Key: DERBY-3214
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: A B
Reporter: A B
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

Optimizer can see negative cost estimates when pulling Optimizables from the join order.

Created: 18/Nov/07 01:36 AM   Updated: 31/Jan/08 04:40 PM
Return to search
Component/s: SQL
Affects Version/s: 10.2.1.6, 10.2.2.0, 10.3.1.4
Fix Version/s: 10.3.3.0, 10.4.1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works d3214_followup_1.patch 2007-12-15 08:52 PM A B 2 kB
Text File Licensed for inclusion in ASF works d3214_v1.patch 2007-12-06 08:00 PM A B 3 kB
Java Source File Licensed for inclusion in ASF works doubleAdd.java 2007-11-18 01:38 AM A B 1.0 kB
Issue Links:
Reference
 

Resolution Date: 12/Dec/07 04:19 PM


 Description  « Hide
When iterating through join order permutations for a query the optimizer places "Optimizables" (FromTables) into a join order, estimates the cost, then "pulls" Optimizables out of the join order and re-places them in different positions. For details see:

  http://wiki.apache.org/db-derby/JoinOrderPermutations

As optimizables are added to the join order the optimizer keeps track of the accumulated cost estimate for the join order. Then when an Optimizable is removed ("pulled") from the join order, the optimizer substracts that Optimizable's cost from the total accumulated cost.

In certain cases (esp. with very large queries) it's possible that the cost for some Optimizable OPT_A is so large that adding it to the accumulated cost of the join order leads to loss of the previous sum. This happens due to normal Java addition of double values, see "doubleAdd.java" attached.

As an example, assume our current join order is:

  { OPT_0, OPT_1, -- }

and that the estimated costs for OPT_0 and OPT_1 are 700 and 800, respectively. The accumulated cost for OPT_0 and OPT_1 is then 700 + 800 = 1500. Then assume we place OPT_A into the final position in the join order:

  { OPT_0, OPT_1, OPT_A }

If the cost of OPT_A is something that is orders of magnitude larger than 1500, then by adding it to 1500 we will effectively "lose" the 1500. Let's say the cost of OPT_A is estimated to be 3.14E50 (which is actually possible, esp. as a result of DERBY-1905). The size of OPT_A's cost makes the cost of OPT_0 + OPT_1 insignificant when using Java doubles (see attached doubleAdd.java):

  1500 + 3.14E50 = 3.14E50

So the total accumulated cost for the join order is now 3.14E50. Later, when we go to pull OPT_A from the join order, we'll subtract its cost from the accumulated cost, yielding:

  3.14E50 - 3.14E50 = 0

Notice how the accumulated cost, which is supposed to represent the cost of OPT_0 plus the cost of OPT_1, is now ZERO. And our join order goes back to:

  { OPT_0, OPT_1, -- }

Next we pull OPT_1 from the join order, which means we have to subtract it's cost from the accumulated cost:

  0 - 800 = -800

So we end up with a negative accumulated cost, which is WRONG. (Actually, the ZERO accumulated cost in the previous step was wrong; this is just a side effect).

As it turns out, there is code in OptimizerImpl that tries to account for negative costs when the negative value comes from normal imprecise arithmetic. In particular we see the following in the code that pulls Optimizables:

    double newCost = currentCost.getEstimatedCost();
    if (pullCostEstimate != null)
    {
        pullCost = pullCostEstimate.getEstimatedCost();
        newCost -= pullCost;

        /*
         ** It's possible for newCost to go negative here due to
         ** loss of precision.
         */
         if (newCost < 0.0)
             newCost = 0.0;

        ...
    }

This code hides the error mentioned above because when it sees "-800" it assumes that the negative stems from normal loss of precision. So the cost for the plan is incorrectly set to "0", which makes it cheaper than any other plan thus far (and probably cheaper than anything to come), and therefore the optimizer will probably choose the wrong plan.

I think the check for negative newCost is only valid if the join position that we're pulling is 0--i.e. if we just pulled the first optimizable in the join order. In that case the accumulated cost *should* be zero, so checking for a negative value and setting it to zero is fine--we're just accounting for the loss of precision that is mentioned in the current code comments.

Note that the same issue also exists for "sort avoidance costs", but in that code there is (currently) no check for negative costs. So if a situation as described above occurs in the current code when the cost of OPT_A is for a sort avoidance plan, the code will throw an ASSERTION failure because the cost should be non-negative.

I noticed this behavior somewhat accidentally while testing out a fix for DERBY-3023: with my attempted fix applied, the query "new-style-sql.txt" was failing with an ASSERTION failure due to the negative cost estimate. Hence this jira.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
A B added a comment - 18/Nov/07 01:38 AM
Attaching a quick patch, d3214_notTested_v1.patch, to "recover" the lost sum and avoid negative cost estimates. I have *not* run regression tests on this yet, I'm just posting a quick fix that resolved for the problem for me while I was looking at DERBY-3023.

A B added a comment - 06/Dec/07 08:00 PM
I ran suites.All and derbyall with d3214_notTested_v1.patch applied and there were no failures. So I'm re-attaching the patch with "notTested" removed from its name.

Thomas Nielsen added a comment - 06/Dec/07 10:09 PM
A B, your patch looks good from what I can tell.
I've started suites.All but don't expect any problems from this patch.

A B added a comment - 07/Dec/07 04:35 PM
Thank you for taking the time to look at the patch, Thomas. Did suites.All run cleanly for you with the patch applied?

Thomas Nielsen added a comment - 07/Dec/07 04:50 PM
My suites.All did show some errors, but I haven't had the time to investigate wether it's related to your patch or not as of now.

Thomas Nielsen added a comment - 08/Dec/07 05:08 PM
I get the same errors both with and without your patch A B, so it most likely not related. Since your tests passed without issues, I'm +1 for commit.

A B added a comment - 12/Dec/07 04:18 PM
Thank you for the review, Thomas. I committed the patch to trunk with svn # 603659:

  http://svn.apache.org/viewvc?rev=603659&view=rev

A B added a comment - 12/Dec/07 04:19 PM
Marking as resolved. Will wait a few days to see if any issues arise in the tinderbox runs before closing.

A B added a comment - 15/Dec/07 08:52 PM
There was a glitch in the first patch: when recovering the cost for the sort avoidance plan, we need to look at the sort avoidance costs (not the default "best access" costs). Attaching d3214_followup_1.patch, which I committed with svn # 604513:

  http://svn.apache.org/viewvc?rev=604513&view=rev

A B added a comment - 24/Jan/08 05:08 PM
I just noticed that this patch was not ported to 10.3, even though the issue that exposes it (DERBY-3023) was. I merged the changes without problem and am now running the tests. If all goes well, I plan to commit to 10.3 later today.

A B added a comment - 24/Jan/08 11:20 PM
derbyall and suites.All ran cleanly with this patch using 10.3 jars and ibm142. So I committed the changes to 10.3 with svn # 615076:

  URL: http://svn.apache.org/viewvc?rev=615076&view=rev

Adding 10.3 to the Fix In list. If no problems arise in 10.3 tinderbox over the next couple of days, I plan to close this issue.

A B added a comment - 31/Jan/08 04:40 PM
No apparent issues after port to 10.3, as tinderbox and daily test results for 10.3 have been clean for several days now. So I'm closing the issue.