Uploaded image for project: 'Derby'
  1. Derby
  2. DERBY-3214

Optimizer can see negative cost estimates when pulling Optimizables from the join order.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 10.2.1.6, 10.2.2.0, 10.3.1.4
    • 10.3.3.0, 10.4.1.3
    • SQL
    • None

    Description

      When iterating through join order permutations for a query the optimizer places "Optimizables" (FromTables) into a join order, estimates the cost, then "pulls" Optimizables out of the join order and re-places them in different positions. For details see:

      http://wiki.apache.org/db-derby/JoinOrderPermutations

      As optimizables are added to the join order the optimizer keeps track of the accumulated cost estimate for the join order. Then when an Optimizable is removed ("pulled") from the join order, the optimizer substracts that Optimizable's cost from the total accumulated cost.

      In certain cases (esp. with very large queries) it's possible that the cost for some Optimizable OPT_A is so large that adding it to the accumulated cost of the join order leads to loss of the previous sum. This happens due to normal Java addition of double values, see "doubleAdd.java" attached.

      As an example, assume our current join order is:

      { OPT_0, OPT_1, -- }

      and that the estimated costs for OPT_0 and OPT_1 are 700 and 800, respectively. The accumulated cost for OPT_0 and OPT_1 is then 700 + 800 = 1500. Then assume we place OPT_A into the final position in the join order:

      { OPT_0, OPT_1, OPT_A }

      If the cost of OPT_A is something that is orders of magnitude larger than 1500, then by adding it to 1500 we will effectively "lose" the 1500. Let's say the cost of OPT_A is estimated to be 3.14E50 (which is actually possible, esp. as a result of DERBY-1905). The size of OPT_A's cost makes the cost of OPT_0 + OPT_1 insignificant when using Java doubles (see attached doubleAdd.java):

      1500 + 3.14E50 = 3.14E50

      So the total accumulated cost for the join order is now 3.14E50. Later, when we go to pull OPT_A from the join order, we'll subtract its cost from the accumulated cost, yielding:

      3.14E50 - 3.14E50 = 0

      Notice how the accumulated cost, which is supposed to represent the cost of OPT_0 plus the cost of OPT_1, is now ZERO. And our join order goes back to:

      { OPT_0, OPT_1, -- }

      Next we pull OPT_1 from the join order, which means we have to subtract it's cost from the accumulated cost:

      0 - 800 = -800

      So we end up with a negative accumulated cost, which is WRONG. (Actually, the ZERO accumulated cost in the previous step was wrong; this is just a side effect).

      As it turns out, there is code in OptimizerImpl that tries to account for negative costs when the negative value comes from normal imprecise arithmetic. In particular we see the following in the code that pulls Optimizables:

      double newCost = currentCost.getEstimatedCost();
      if (pullCostEstimate != null)

      { pullCost = pullCostEstimate.getEstimatedCost(); newCost -= pullCost; /* ** It's possible for newCost to go negative here due to ** loss of precision. */ if (newCost < 0.0) newCost = 0.0; ... }

      This code hides the error mentioned above because when it sees "-800" it assumes that the negative stems from normal loss of precision. So the cost for the plan is incorrectly set to "0", which makes it cheaper than any other plan thus far (and probably cheaper than anything to come), and therefore the optimizer will probably choose the wrong plan.

      I think the check for negative newCost is only valid if the join position that we're pulling is 0-i.e. if we just pulled the first optimizable in the join order. In that case the accumulated cost should be zero, so checking for a negative value and setting it to zero is fine-we're just accounting for the loss of precision that is mentioned in the current code comments.

      Note that the same issue also exists for "sort avoidance costs", but in that code there is (currently) no check for negative costs. So if a situation as described above occurs in the current code when the cost of OPT_A is for a sort avoidance plan, the code will throw an ASSERTION failure because the cost should be non-negative.

      I noticed this behavior somewhat accidentally while testing out a fix for DERBY-3023: with my attempted fix applied, the query "new-style-sql.txt" was failing with an ASSERTION failure due to the negative cost estimate. Hence this jira.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            army A B
            army A B
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment