Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2227

separate chararrayset interface from impl

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: 4.9, 6.0
    • Component/s: modules/analysis
    • Labels:
    • Lucene Fields:
      New

      Description

      CharArraySet should be abstract
      the hashing implementation currently being used should instead be called CharArrayHashSet

      currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might want their own impl.
      For example, implementing CharArraySet as DFA with org.apache.lucene.util.automaton gives faster contains(char[], int, int) performance, as it can do a 'fast fail' and need not hash the entire string.

      This is useful as it speeds up indexing in StopFilter.

      I did not think this would be faster but i did benchmarks over and over with the reuters corpus, and it is, even with english text's wierd average word length of 5

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: