Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2227

separate chararrayset interface from impl

Details

    • Task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.0
    • 4.9, 6.0
    • modules/analysis
    • New

    Description

      CharArraySet should be abstract
      the hashing implementation currently being used should instead be called CharArrayHashSet

      currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might want their own impl.
      For example, implementing CharArraySet as DFA with org.apache.lucene.util.automaton gives faster contains(char[], int, int) performance, as it can do a 'fast fail' and need not hash the entire string.

      This is useful as it speeds up indexing in StopFilter.

      I did not think this would be faster but i did benchmarks over and over with the reuters corpus, and it is, even with english text's wierd average word length of 5

      Attachments

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: