Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8980

Optimise SegmentTermsEnum.seekExact performance

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 8.2
    • Fix Version/s: 8.3
    • Component/s: core/codecs
    • Labels:
    • Lucene Fields:
      New

      Description

      Description

      In Elasticsearch, which is based on Lucene, each document has an indexed _id field that uniquely identifies it. When Elasticsearch use the _id field to find a document from Lucene, Lucene have to check all the segments of the index. When the values of the _id field are very sequentially, the performance is optimizable.
       

      Solution

      Since Lucene stores min/maxTerm metrics for each segment and field, we can use those metrics to optimise performance of Lucene look up API. When calling SegmentTermsEnum.seekExact() to lookup an term in an index, we can check whether the term fall in the range of minTerm and maxTerm, so that we can skip some useless segments as soon as possible.
       
      This improvement is beneficial to ES read/write API and Lucene look up API.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dsmiley David Smiley
                Reporter:
                jgq2008303393 Guoqiang Jiang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 20m
                  5h 20m