Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2302

Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array.
      Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence

      I propose to create a new interface "CharTermAttribute" with a clean new API that concentrates on CharSequence and Appendable.
      The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute.

      To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then.

        Attachments

        1. LUCENE-2302.patch
          53 kB
          Uwe Schindler
        2. LUCENE-2302.patch
          52 kB
          Uwe Schindler
        3. LUCENE-2302.patch
          44 kB
          Uwe Schindler
        4. LUCENE-2302.patch
          44 kB
          Uwe Schindler
        5. LUCENE-2302.patch
          44 kB
          Uwe Schindler
        6. LUCENE-2302-toString.patch
          8 kB
          Uwe Schindler

          Issue Links

            Activity

              People

              • Assignee:
                thetaphi Uwe Schindler
                Reporter:
                thetaphi Uwe Schindler
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: