[LUCENE-6690] Speed up MultiTermsEnum.next() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.3
Component/s: None
Labels:
None

Lucene Fields:

New

Description

OrdinalMap is very useful when computing top terms on a multi-index segment. However I've seen it being occasionally slow to build, which was either making facets (when the ordinals map is computed lazily) or reopen (when computed eagerly) slow. So out of curiosity, I tried to profile ordinal map building on a simple index: 10M random strings of length between 0 and 20 stored as a SORTED doc values field. The index has 19 segments. The bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons (UTF8SortedAsUnicodeComparator).

MultiTermsEnum stores sub enums in two different places:

top: a simple array containing all enums on the current term
queue: a queue for enums that are not exhausted yet but beyond the current term.

A non-exhausted enum is in exactly one of these data-structures. When moving to the next term, MultiTermsEnum advances all enums in top, then adds them to queue and finally, pops all enum that are on the same term back into top.

We could save reorderings of the priority queue by not removing entries from the priority queue and then calling updateTop to advance enums which are on the current term. This is already what we do for disjunctions of doc IDs in DISIPriorityQueue.

On the index described above and current trunk, building an OrdinalMap has to call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OrdinalMapBuildBench.java
20/Jul/15 16:11
2 kB
Adrien Grand
LUCENE-6690.patch
20/Jul/15 16:13
4 kB
Adrien Grand

Activity

People

Assignee:: Adrien Grand

Reporter:: Adrien Grand

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Jul/15 16:05

Updated:: 28/Aug/22 14:39

Resolved:: 22/Jul/15 11:29