Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-4816

Property index: cost estimate with path restriction is too optimistic

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.5.12, 1.6.0
    • query
    • None

    Description

      The property index cost estimation is too optimistic in case there is a property restriction plus a path restriction. The current algorithm, as documented in http://jackrabbit.apache.org/oak/docs/query/property-index.html#Cost_Estimation , assumes that matching entries are evenly distributed over the whole repository. In many cases, this is not the case. In extreme cases, all entries that match the property restriction are in the subtree that matches the path restriction. Example:

      • 10'000 nodes with property color "red".
      • 1 million nodes in the repository
      • 10'000 nodes in the subtree /content
      • query /jcr:root/content//*[@color = 'red']

      Currently, the cost estimate is about 100, there are about 10'000 entries for "red", and "/content" contains 1% of all nodes. But in reality, there might be 10'000 entries with color "red" in that subtree (that is, all of them).

      The cost estimation should take that into account, and assume that at least 80% of the matching nodes are in that subtree (if the subtree contains that many nodes).

      Attachments

        Activity

          People

            thomasm Thomas Mueller
            thomasm Thomas Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: