Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5927

4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None
    • New

    Description

      In 4.9, this string was broken into 2 tokens by StandardTokenizer:
      "\u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72" = "\u1aa2", " \u1a7f\u1a6f\u1a6f\u1a61\u1a72"

      However, in 4.10, that has changed so it is now a single token returned.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rjernst Ryan Ernst
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: