[DERBY-1905] Optimizer cost estimates for subqueries are way (way) too high. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.1.3.2, 10.2.1.6, 10.3.1.4
Fix Version/s: None
Component/s: SQL
Labels:
- derby_triage10_11

Urgency:
Normal
Issue & fix info:

High Value Fix, Repro attached
Bug behavior facts:

Performance

Description

If I run the attached repro (pulled from ~~DERBY-1866~~) with derby.optimizer.noTimeout=true (meaning the optimizer should take all the time it needs to try out all possible plans within the restrictions of the optimizer overrides), the estimated cost shown in derby.log (if I log the query plan) is over 600k and the estimated row count is over 520k.

If, as the code in OptimizerImpl seems to expect, the unit for a cost estimate is milliseconds, then the optimize here is guessing that the query will take over 10 minutes to execute and will return a half-million rows. But in truth the combined time for compilation AND execution is just a couple of seconds, and only 15 rows are returned.

That suggests to me a rather serious problem with the optimizer cost estimates for subqueries.

Among other things this can have a major impact on the optimizer's timeout mechanism for very deeply-nested queries. The optimizer will continue to try out different combinations of indexes/joinOrders/joinStrategies until it either exhausts them all or until the number of milliseconds spent optimizing is greater than the "best cost" estimate so far. In the case of the repro for this issue, the optimizer quickly exhausts all of the options and so it finishes in a fair amount of time.

But in larger queries where there are far more combinations to try (see esp. queries attached to ~~DERBY-1205~~, ~~DERBY-1777~~), these severly inflated cost estimates get very large very quickly (sometimes even reaching infinity--see DERBY-1259, DERBY-1260) and so the optimizer just keeps on optimizing and never times out. The result is that for queries like those in ~~DERBY-1777~~, the optimizer can spend literally hours optimizing a query which then executes in a matter of seconds.

I'm still investigating this, but preliminary examination suggests that part of the problem is in some way related to the treatment of "outer costs" by the optimizer--and in particular, it looks like the optimizer is perhaps too aggressive in multiplying cost estimates by row counts pulled from "outer costs". That's just my first guess at the problem, though; there could of course be far more to it than that...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

d1905_sandbox_doNOTCommit.patch
13/Dec/06 17:55
12 kB
A B
d1905.sql
29/Sep/06 21:38
1 kB
A B

Issue Links

is related to

DERBY-3214 Optimizer can see negative cost estimates when pulling Optimizables from the join order.

Closed

Activity

People

Assignee:: Unassigned

Reporter:: A B

Votes:: 1 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 29/Sep/06 21:18

Updated:: 03/Jul/13 15:26