[LUCENE-8980] Optimise SegmentTermsEnum.seekExact performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 8.2
Fix Version/s: 8.3
Component/s: core/codecs
Labels:
- performance

Lucene Fields:

New

Description

Description

In Elasticsearch, which is based on Lucene, each document has an indexed _id field that uniquely identifies it. When Elasticsearch use the _id field to find a document from Lucene, Lucene have to check all the segments of the index. When the values of the _id field are very sequentially, the performance is optimizable.

Solution

Since Lucene stores min/maxTerm metrics for each segment and field, we can use those metrics to optimise performance of Lucene look up API. When calling SegmentTermsEnum.seekExact() to lookup an term in an index, we can check whether the term fall in the range of minTerm and maxTerm, so that we can skip some useless segments as soon as possible.

This improvement is beneficial to ES read/write API and Lucene look up API.

Attachments

Issue Links

links to

GitHub Pull Request #884

Activity

People

Assignee:: David Smiley

Reporter:: Guoqiang Jiang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Sep/19 09:37

Updated:: 28/Aug/22 15:50

Resolved:: 26/Sep/19 19:58

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

5h 20m