[LUCENE-461] StandardTokenizer splitting all of Korean words into separate characters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.9
Component/s: modules/analysis
Labels:
None
Environment:

Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer.

Description

StandardTokenizer splits all those Korean words inth separate character tokens. For example, "?????" is one Korean word that means "Hello", but StandardAnalyzer separates it into five tokens of "?", "?", "?", "?", "?".

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

StandardTokenizer_KoreanWord.patch
08/Nov/05 15:59
1 kB
Cheolgoo Kang
TestStandardAnalyzer_KoreanWord.patch
08/Nov/05 15:59
0.4 kB
Cheolgoo Kang

Activity

People

Assignee:: Unassigned

Reporter:: Cheolgoo Kang

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Nov/05 15:55

Updated:: 28/Aug/22 11:24

Resolved:: 12/Nov/05 17:36