[LUCENE-324] org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Trivial
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.9
Component/s: modules/analysis
Labels:
None
Environment:

Operating System: All
Platform: All

Bugzilla Id:
32687

Description

Apparently, in ChineseTokenizer, offset should be decremented like bufferIndex
when Character is OTHER_LETTER. This directly affects startOffset and endOffset
values.

This is critical to have Highlighter working correctly because Highlighter marks
matching text based on these offset values.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--chinese_tokenizer-missing_offset.patch
14/Dec/04 18:10
0.6 kB
Ray Tsang
ASF.LICENSE.NOT.GRANTED--ChineseTokenizerTest.java
15/Dec/04 12:01
0.8 kB
Ray Tsang

Activity

People

Assignee:: Unassigned

Reporter:: Ray Tsang

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 14/Dec/04 18:09

Updated:: 28/Aug/22 11:19

Resolved:: 05/Dec/05 08:09