Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9585

Make preserving original token in CompoundWordTokenFilterBase configurable

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 8.5.1
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When using a subclass of CompoundWordTokenFilterBase the filter will always output the original input token along with the decomposed tokens if there are any. This will result in documents that originally had the compound form to have both the compound and decomposed form while documents that originally had the decomposed form will only have the decomposed form. Only queries in the decomposed forms will match more documents when using this filter.

      If the filter can also be run at query time compound forms can be decomposed and match additional documents. To do this the filter needs to be able to return only the decomposed form if there is a decomposed form. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Geoffrey Lawson Geoffrey Lawson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: