[LUCENE-2207] CJKTokenizer generates tokens with incorrect offsets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.9.1, 3.0
Fix Version/s: 2.9.2, 3.0.1, 4.0-ALPHA
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

If I index a Japanese multi-valued document with CJKTokenizer and highlight a term with FastVectorHighlighter, the output snippets have incorrect highlighted string. I'll attach a program that reproduces the problem soon.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestCJKOffset.java
13/Jan/10 16:57
7 kB
Koji Sekiguchi
LUCENE-2207.patch
13/Jan/10 17:48
0.7 kB
Robert Muir
LUCENE-2207.patch
13/Jan/10 17:59
2 kB
Robert Muir
LUCENE-2207.patch
13/Jan/10 18:16
4 kB
Robert Muir
LUCENE-2207.patch
17/Jan/10 03:51
6 kB
Koji Sekiguchi

Issue Links

is part of

LUCENE-2219 improve BaseTokenStreamTestCase to test end()

Closed

Activity

People

Assignee:: Robert Muir

Reporter:: Koji Sekiguchi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Jan/10 16:38

Updated:: 28/Aug/22 12:18

Resolved:: 17/Jan/10 21:45