[LUCENE-9853] Use CJKWidthCharFilter as the default character normalizer for JapaneseAnalyzer instead of CJKWidthFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Minor
Resolution: Fixed
Affects Version/s: 9.0
Fix Version/s: 9.0
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

Follow-up issue of ~~LUCENE-9413~~.

We now have CJKWidthCharFilter in analyzers-common. I believe in many situations it is recommended applying half-width/full-width character normalization before tokenization for consistency in analysis.

The change slightly affects on the analyzer's outputs. We can provide a parameter to switch back to CJKWidthFilter for backward compatibility.

Attachments

Issue Links

is related to

LUCENE-9413 Add a char filter corresponding to CJKWidthFilter

Closed

links to

GitHub Pull Request #26

Activity

People

Assignee:: Tomoko Uchida

Reporter:: Tomoko Uchida

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Mar/21 08:46

Updated:: 15/Sep/24 22:23

Resolved:: 29/Oct/21 02:57

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

50m