Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2183

Supplementary Character Handling in CharTokenizer

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1
    • modules/analysis
    • None
    • New, Patch Available

    Description

      CharTokenizer is an abstract base class for all Tokenizers operating on a character level. Yet, those tokenizers still use char primitives instead of int codepoints. CharTokenizer should operate on codepoints and preserve bw compatibility.

      Attachments

        1. LUCENE-2183.patch
          25 kB
          Simon Willnauer
        2. LUCENE-2183.patch
          58 kB
          Simon Willnauer
        3. LUCENE-2183.patch
          58 kB
          Simon Willnauer
        4. LUCENE-2183.patch
          58 kB
          Simon Willnauer
        5. LUCENE-2183.patch
          60 kB
          Simon Willnauer
        6. LUCENE-2183.patch
          62 kB
          Uwe Schindler
        7. LUCENE-2183.patch
          65 kB
          Simon Willnauer

        Issue Links

          Activity

            People

              uschindler Uwe Schindler
              simonw Simon Willnauer
              Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: