Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9853

Use CJKWidthCharFilter as the default character normalizer for JapaneseAnalyzer instead of CJKWidthFilter

Details

    • Improvement
    • Status: Reopened
    • Minor
    • Resolution: Fixed
    • 9.0
    • 9.0
    • modules/analysis
    • None
    • New

    Description

      Follow-up issue of LUCENE-9413.

      We now have CJKWidthCharFilter in analyzers-common. I believe in many situations it is recommended applying half-width/full-width character normalization before tokenization for consistency in analysis.

      The change slightly affects on the analyzer's outputs. We can provide a parameter to switch back to CJKWidthFilter for backward compatibility.

      Attachments

        Issue Links

          Activity

            People

              tomoko Tomoko Uchida
              tomoko Tomoko Uchida
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m

                  Slack

                    Issue deployment