Lucene - Core
  1. Lucene - Core
  2. LUCENE-1545

Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Linux x86_64, Sun Java 1.6

      Description

      Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E.
      The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining character is lost.
      Expected result is only on token "moͤchte".

      1. AnalyzerTest.java
        0.5 kB
        Andreas Hauser

        Issue Links

          Activity

          Andreas Hauser created issue -
          Andreas Hauser made changes -
          Field Original Value New Value
          Attachment AnalyzerTest.java [ 12400612 ]
          Mark Miller made changes -
          Link This issue is part of LUCENE-1488 [ LUCENE-1488 ]
          Mark Miller made changes -
          Fix Version/s 3.0 [ 12312889 ]
          Fix Version/s 2.9 [ 12312682 ]
          Priority Major [ 3 ] Minor [ 4 ]
          Michael McCandless made changes -
          Fix Version/s 3.1 [ 12314025 ]
          Fix Version/s 3.0 [ 12312889 ]
          Robert Muir made changes -
          Component/s contrib/analyzers [ 12312333 ]
          Component/s Analysis [ 12310230 ]
          Robert Muir made changes -
          Link This issue is part of LUCENE-2167 [ LUCENE-2167 ]
          Steve Rowe made changes -
          Assignee Steven Rowe [ steve_rowe ]
          Steve Rowe made changes -
          Fix Version/s 3.1 [ 12314822 ]
          Lucene Fields [New]
          Robert Muir made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mark Thomas made changes -
          Workflow jira [ 12453058 ] Default workflow, editable Closed status [ 12563782 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12563782 ] jira [ 12585322 ]
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Shai Erera made changes -
          Component/s modules/analysis [ 12310230 ]
          Component/s contrib/analyzers [ 12312333 ]

            People

            • Assignee:
              Steve Rowe
              Reporter:
              Andreas Hauser
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development