Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-6333

IndexPlanner should use actual entryCount instead of limiting it to 1000

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.4, 1.6.4, 1.4.18, 1.8.0, 1.2.28
    • Component/s: lucene
    • Labels:
      None

      Description

      Currently IndexPlanner uses following logic for estimating the entryCount

      1. If the index has fulltext indexing enable then and query has a fulltext constraint clause specified
        1. If entryCount value is defined then min(entryCount, numOfDocs)
        2. If not then use the numDocs i.e. actual entry count
      2. If the index is pure property index i.e. none of the property definitions have analyzed set to true
        1. If entryCount value is defined then min(entryCount, numOfDocs)
        2. Else Take min(1000, numDocs)

      Revisiting the logic for #2 it appears in 1.0.x days (OAK-2200) we capped it to 1000 because cost estimation for property indexes was inaccurate (they used to report low values causing lucene index to loose).

      With support for Counters the cost estimation for property index has improved and now we should remove this capping and let it use numDocs.

      One area where it causes issue is when we have two indexes where one is superset of other. For e.g. /oak:index/asset and /content/en/ /oak:index/asset where both have some matching properties. Logically if query can be handled by sub index then it should get picked but currently either of them can be picked making query plan undeterministic

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                chetanm Chetan Mehrotra
                Reporter:
                chetanm Chetan Mehrotra
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: