[LUCENE-2227] separate chararrayset interface from impl - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.0
Fix Version/s: 4.9, 6.0
Component/s: modules/analysis
Labels:
- dead

Lucene Fields:

New

Description

CharArraySet should be abstract
the hashing implementation currently being used should instead be called CharArrayHashSet

currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might want their own impl.
For example, implementing CharArraySet as DFA with org.apache.lucene.util.automaton gives faster contains(char[], int, int) performance, as it can do a 'fast fail' and need not hash the entire string.

This is useful as it speeds up indexing in StopFilter.

I did not think this would be faster but i did benchmarks over and over with the reuters corpus, and it is, even with english text's wierd average word length of 5

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/Jan/10 12:59

Updated:: 28/Aug/22 12:19