Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1215

Support of Unicode Collation

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • modules/analysis
    • None
    • New, Patch Available

    Description

      New in java 6, we have java.text.Normalizer that supports Unicode Standard Annex #15 normalization.
      http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
      http://www.unicode.org/unicode/reports/tr15/

      The normalization defined has four variants of C, D, KC, KD. Canonical Decomposition or Compatibility Decomposition will be normalize the representation of a String, and the search result will be improved.

      I'd like to submit a TokenFilter code supporting this feature!

      Attachments

        1. NormalizerTokenFilter.java
          0.6 kB
          Hiroaki Kawai

        Activity

          People

            Unassigned Unassigned
            kawai Hiroaki Kawai
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: