[OAK-4575] Oak 1.0.x fulltext search with ideographic space (U+3000) as separator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.32
Fix Version/s: 1.0.33
Component/s: query
Labels:
None

Description

In Oak 1.0, the Lucene index uses its own tokenizer. That tokenizer doesn't support ideographic space (U+3000) as word separator.

In Oak 1.2 and later, the Lucene tokenizer is used, which works as expected.

Backporting all relevant changed from Oak 1.2 to the 1.0 branch would be a lot of changes, and the risk of regression would be high (too high in my view). An alternative is to add support for the ideographic space in the query engine (replace it with a regular space character). Please note the behavior is still not exactly the same as with Oak 1.2, but as for this exact use case it is expected to work correctly.

Attachments

Activity

People

Assignee:: Thomas Mueller

Reporter:: Thomas Mueller

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Jul/16 08:06

Updated:: 22/Aug/16 06:30

Resolved:: 21/Jul/16 13:02