[SOLR-8334] Highlighting content field problem when using JiebaTokenizerFactory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: 5.3
Fix Version/s: None
Component/s: highlighter, search
Labels:
- patch
Environment:

Windows 8.1, Solr 5.3, ZooKeeper 3.4.6, jieba-analysis-1.0.0

Description

When I tried to use the JiebaTokenizerFactory to index Chinese characters in Solr, it works fine with the segmentation when I'm using the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not highlighting in the correct place. For example, when I search of 自然环境与企业本身, it highlight 认为自然环境与企业本身的
Even when I search for English character like responsibility, it highlight responsibility.

Basically, the highlighting goes off by 1 character/space consistently.
This problem only happens in content field, and not in any other fields.

I've made some minor modification in the code under JiebaSegmenter.java, and the highlighting seems to be fine now.

Basically, I created another int called offset2 under process() method.
int offset2 = 0;
After which, I modified the offset to offset2 for this part of the code under process() method.
The changes are in the attachment below.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

JiebaSegmenter.java
24/Nov/15 03:31
8 kB
Edwin Yeo Zheng Lin

Activity

People

Assignee:: Unassigned

Reporter:: Edwin Yeo Zheng Lin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Nov/15 03:29

Updated:: 29/Sep/16 20:24

Resolved:: 29/Sep/16 20:24

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified