Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1545

Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.4
    • 3.1, 4.0-ALPHA
    • modules/analysis
    • None
    • Linux x86_64, Sun Java 1.6

    Description

      Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E.
      The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining character is lost.
      Expected result is only on token "moͤchte".

      Attachments

        1. AnalyzerTest.java
          0.5 kB
          Andreas Hauser

        Issue Links

          Activity

            People

              sarowe Steven Rowe
              andyhauser Andreas Hauser
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: