Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9567

JapanesePartOfSpeechStopFilterFactory should load built-in stop tags by default

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 8.6
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If JapanesePartOfSpeechStopFilterFactory is given empty args, it does nothing. It doesn't load any stop tags, and just passes along the TokenStream passed to create().

      As a default behavior, this is trappy, since a user may add the filter without explicitly adding any arguments and assume that it would load a "default" stop set. Or they may assume that if an explicit argument is required then an exception will be thrown. Regardless, "doing nothing" is almost certainly not what the user intended.

      I'm going to attach a patch to load the default stop tags (using JapaneseAnalyzer.getDefaultStopTags()) if no args are specified, which probably makes sense in 9.0 (as it's consistent with e.g. KoreanPartOfSpeechStopFilterFactory). If we want to apply a fix to 8.x, maybe throw an exception to let the use know that the FilterFactory probably isn't doing what they think it's doing?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                msfroh Michael Froh
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h