Lucene - Core
  1. Lucene - Core
  2. LUCENE-2247

Add CharArrayMap to lucene and make CharAraySet an proxy on the keySet() of it

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch adds a CharArrayMap<V> to Lucene's analysis package as compagnon of CharArraySet. It supports fast retrieval of char[] keys like CharArraySet does. This is important for some stemmers and other places in Lucene.

      Stemers generally use CharArrayMap<String>, which has then get(char[]) returning String. Strings are compact and can be easily copied into termBuffer. A Map<String,String> would be slow as the termBuffer would be first converted to String, then looked up. The return value as String is perfectly legal, as it can be copied easily into termBuffer.

      This class borrows lots of code from Solr's pendant, but has additional features and more consistent API according to CharArraySet. The key is always <?>, because as of CharArraySet, anything that has a toString() representation can be used as key (of course with overhead). It also defines a unmodifiable map and correct iterators (returning the native char[]).

      CharArraySet was made consistent and now returns for matchVersion>=3.1 also an iterator on char[]. CharArraySet's code was almost completely copied to CharArrayMap and removed in the Set. CharArraySet is now a simple proxy on the keySet().

      In future we can think of making CharArraySet/CharArrayMap/CharArrayCollection an interface so the whole API would be more consistent to the Java collections API. But this would be a backwards break. But it would be possible to use better impl instead of hashing (like prefix trees).

      1. LUCENE-2247.patch
        57 kB
        Uwe Schindler
      2. LUCENE-2247.patch
        56 kB
        Uwe Schindler
      3. LUCENE-2247.patch
        53 kB
        Uwe Schindler
      4. LUCENE-2247.patch
        52 kB
        Uwe Schindler
      5. LUCENE-2247.patch
        52 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Here the patch.

        To apply, first do:

        svn copy src/java/org/apache/lucene/analysis/CharArraySet.java src/java/org/apache/lucene/analysis/CharArrayMap.java
        

        Have fun!

        Show
        Uwe Schindler added a comment - Here the patch. To apply, first do: svn copy src/java/org/apache/lucene/analysis/CharArraySet.java src/java/org/apache/lucene/analysis/CharArrayMap.java Have fun!
        Hide
        Uwe Schindler added a comment -

        Add CHANGES entry and some javadoc improvements and typo fixes. No code changes.

        Show
        Uwe Schindler added a comment - Add CHANGES entry and some javadoc improvements and typo fixes. No code changes.
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment -

        Thanks Robert!

        I only optimized the entrySet() calls to only produce the "view" one time and cache (without synchronization of course), like the Java collections API suggests and does.

        I think it is now ready to commit.

        Show
        Uwe Schindler added a comment - Thanks Robert! I only optimized the entrySet() calls to only produce the "view" one time and cache (without synchronization of course), like the Java collections API suggests and does. I think it is now ready to commit.
        Hide
        Uwe Schindler added a comment -

        Improved patch:

        • keySet now returns a CharArraySet view on the map
        • toString() improvements of all views in CAS and CAM
        • further tests on the views and toString() outputs
        Show
        Uwe Schindler added a comment - Improved patch: keySet now returns a CharArraySet view on the map toString() improvements of all views in CAS and CAM further tests on the views and toString() outputs
        Hide
        Uwe Schindler added a comment -

        I will commit this in a day or two.

        Show
        Uwe Schindler added a comment - I will commit this in a day or two.
        Hide
        Uwe Schindler added a comment -

        Here the latest patch, now committed revision: 906032

        Show
        Uwe Schindler added a comment - Here the latest patch, now committed revision: 906032
        Hide
        Uwe Schindler added a comment -

        Thanks Robert for testing!

        Show
        Uwe Schindler added a comment - Thanks Robert for testing!

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development