Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As described in OAK-2081, the cost formula currently used in the query engine is not correct if "limit" is used, because it doesn't account for false positives.
Example: Let's say there are two indexes:
- color: 10000 nodes with color=red, but a bit slower (lets say a remote index), cost per entry is 1.5.
- size: 20000 nodes with size=M, but a bit faster (lets say a local index), cost per entry is 1.
Without limit, the index for "color" should be used as 10000 * 1.5 = 15000 is lower than 20000 * 1 = 20000.
With limit=100, then we could calculate as follows: there are at most 10000 entries (according to index "color"), so the false positive rate of the "size" index is at least 50%. So cost of "color" is 100 * 1.5 = 150. Cost of "size" is 100 * 1 = 100, but with false positive rate of 50%, so cost is actually 200. Therefor, still the index "color" should be used.