Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7536

ASCIIFoldingFilterFactory.getMultiTermComponent can emit two tokens

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 6.4, 7.0
    • None
    • None
    • New

    Description

      My understanding is that it is a requirement for multi-term analysis to only normalize tokens, and not eg. remove tokens (stop filter) or add tokens (by tokenizing or adding synonyms). Yet ASCIIFoldingFilterFactory.getMultiTermComponent will return a factory that emits synonyms if preserveOriginal is set to true on the original filter.

      This looks like a bug to me but I'm not entirely sure how to fix it. Should the multi-term analysis component do ascii folding or not if the original factory has preserveOriginal set to true?

      Attachments

        1. LUCENE-7536.patch
          5 kB
          Adrien Grand

        Activity

          People

            Unassigned Unassigned
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: