Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7379

Lucene Index: per-column selectivity, assume 5 unique entries

    XMLWordPrintableJSON

    Details

      Description

      Currently, if a query has a property restriction of the form "property = x", and the property is indexed in a Lucene property index, the estimated cost is the index is the number of documents indexed for that property. This is a very conservative estimate, it means all documents have the same value. So the cost is relatively high for that index.

      In almost all cases, there are many distinct values for a property. Rarely there are few values, or a skewed distribution where one value contains most documents. But in almost all cases there are more than 5 distinct values.

      I think it makes sense to use 5 as the default value. It is still conservative (cost of the index is high), but much better than now.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                thomasm Thomas Mueller
                Reporter:
                thomasm Thomas Mueller
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: