Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7379

Lucene Index: per-column selectivity, assume 5 unique entries

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.9.0, 1.10.0
    • lucene, query
    • None

    Description

      Currently, if a query has a property restriction of the form "property = x", and the property is indexed in a Lucene property index, the estimated cost is the index is the number of documents indexed for that property. This is a very conservative estimate, it means all documents have the same value. So the cost is relatively high for that index.

      In almost all cases, there are many distinct values for a property. Rarely there are few values, or a skewed distribution where one value contains most documents. But in almost all cases there are more than 5 distinct values.

      I think it makes sense to use 5 as the default value. It is still conservative (cost of the index is high), but much better than now.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            thomasm Thomas Mueller
            thomasm Thomas Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment