[LUCENE-5927] 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

In 4.9, this string was broken into 2 tokens by StandardTokenizer:
"\u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72" = "\u1aa2", " \u1a7f\u1a6f\u1a6f\u1a61\u1a72"

However, in 4.10, that has changed so it is now a single token returned.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ryan Ernst

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Sep/14 20:11

Updated:: 28/Aug/22 14:15

Resolved:: 09/Sep/14 15:57