[OAK-1907] Better cost estimates for traversal, property, and ordered indexes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0, 1.0.1, 1.0.2
Fix Version/s: 1.1.3
Component/s: query
Labels:
None

Description

Currently, cost estimates of traversal, property index, and ordered index don't take the number of nodes into account, if there are more than about 100 nodes. This is problematic because in many cases, the wrong index is used (because of incorrect cost estimate).

To get a better estimate, a very rough estimate on the number of child nodes below a given path is needed.

One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, randomly named property (for example called ":count-xyz" where xyz is a uuid, value 100'000) to the parents of that node, so that we know there are probably more than 100'000 nodes below a given path. When removing a node, with the same algorithm add a hidden property (":count-xyz", value -100'000). That should result in a slowdown of less than 0.01%, but should allow us much better cost estimates. Those properties could be consolidated asynchronously if needed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ApproxCount.java
24/Jun/14 12:09
3 kB
Thomas Mueller
OAK-1907.diff
26/Jun/14 12:05
25 kB
Davide Giannella
probability_50t_1m.png
03/Mar/16 10:22
90 kB
Thomas Mueller

Issue Links

is duplicated by

OAK-1898 Query: Incorrect cost calculation for traversal

Closed

OAK-1735 Query: automatically update index statistics to get better cost estimates

Closed

is required by

OAK-2341 Use approx counters property index costs even when path restriction is available

Closed

OAK-2362 Remove entryCount from NodeType Index

Closed

relates to

OAK-2725 Wrong indexed query estimates exceed more than double the actual index entries

Resolved

OAK-2852 Query engine: if counter index is not available, cost of traversing is too low

Closed

OAK-4065 Counter index can get out of sync

Closed

OAK-2839 Without "counter" index, some queries use traversal instead of an index

Closed

(3 relates to)

Activity

People

Assignee:: Thomas Mueller

Reporter:: Thomas Mueller

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Jun/14 09:39

Updated:: 08/Oct/19 15:21

Resolved:: 09/Dec/14 12:36