Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3219

Lucene IndexPlanner should also account for number of property constraints evaluated while giving cost estimation

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: lucene
    • Labels:

      Description

      Currently the cost returned by Lucene index is a function of number of indexed documents present in the index. If the number of indexed entries are high then it might reduce chances of this index getting selected if some property index also support of the property constraint.

      /jcr:root/content/freestyle-cms/customers//element(*, cq:Page)
      [(jcr:content/@title = 'm' or jcr:like(jcr:content/@title, 'm%')) 
      and jcr:content/@sling:resourceType = '/components/page/customer’]
      

      Consider above query with following index definition

      • A property index on resourceType
      • A Lucene index for cq:Page with properties jcr:content/title, jcr:content/sling:resourceType indexed and also path restriction evaluation enabled

      Now what the two indexes can help in

      1. Property index
        1. Path restriction
        2. Property restriction on sling:resourceType
      2. Lucene index
        1. NodeType restriction
        2. Property restriction on sling:resourceType
        3. Property restriction on title
        4. Path restriction

      Now cost estimate currently works like this

      • Property index - f(indexedValueEstimate, estimateOfNodesUnderGivenPath)
        • indexedValueEstimate - For 'sling:resourceType=foo' its the approximate count for nodes having that as 'foo'
        • estimateOfNodesUnderGivenPath - Its derived from an approximate estimation of nodes present under given path
      • Lucene Index - f(totalIndexedEntries)

      As cost of Lucene is too simple it does not reflect the reality. Following 2 changes can be done to make it better

      • Given that Lucene index can handle multiple constraints compared (4) to property index (2), the cost estimate returned by it should also reflect this state. This can be done by setting costPerEntry to 1/(no of property restriction evaluated)
      • Get the count for queried property value - This is similar to what PropertyIndex does and assumes that Lucene can provide that information in O(1) cost. In case of multiple supported property restriction this can be minima of all

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              thomasm Thomas Mueller
              Reporter:
              chetanm Chetan Mehrotra

              Dates

              • Created:
                Updated:

                Issue deployment