The property index cost estimation is too optimistic in case there is a property restriction plus a path restriction. The current algorithm, as documented in http://jackrabbit.apache.org/oak/docs/query/property-index.html#Cost_Estimation , assumes that matching entries are evenly distributed over the whole repository. In many cases, this is not the case. In extreme cases, all entries that match the property restriction are in the subtree that matches the path restriction. Example:
- 10'000 nodes with property color "red".
- 1 million nodes in the repository
- 10'000 nodes in the subtree /content
- query /jcr:root/content//*[@color = 'red']
Currently, the cost estimate is about 100, there are about 10'000 entries for "red", and "/content" contains 1% of all nodes. But in reality, there might be 10'000 entries with color "red" in that subtree (that is, all of them).
The cost estimation should take that into account, and assume that at least 80% of the matching nodes are in that subtree (if the subtree contains that many nodes).