[LUCENE-4556] FuzzyTermsEnum creates tons of objects - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 4.0
Fix Version/s: 4.10, 6.0
Component/s: core/search, modules/spellchecker
Labels:
None

Lucene Fields:

New, Patch Available

Description

I ran into this problem in production using the DirectSpellchecker. The number of objects created by the spellchecker shoot through the roof very very quickly. We ran about 130 queries and ended up with > 2M transitions / states. We spend 50% of the time in GC just because of transitions. Other parts of the system behave just fine here.

I talked quickly to robert and gave a POC a shot providing a LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case and build a array based strucuture converted into UTF-8 directly instead of going through the object based APIs. This involved quite a bit of changes but they are all package private at this point. I have a patch that still has a fair set of nocommits but its shows that its possible and IMO worth the trouble to make this really useable in production. All tests pass with the patch - its a start....

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4556.patch
19/Nov/12 00:32
37 kB
Michael McCandless
LUCENE-4556.patch
13/Nov/12 14:52
61 kB
Simon Willnauer

Activity

People

Assignee:: Michael McCandless

Reporter:: Simon Willnauer

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Nov/12 14:38

Updated:: 28/Aug/22 13:32

Resolved:: 20/Jun/14 21:40