Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4185

CharFilters being added twice in Solr

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0-ALPHA
    • 4.0-BETA, 6.0
    • modules/analysis
    • None
    • New

    Description

      Debugging one of my test cases, I found that a TokenStream from an Analyzer constructed by Solr contains the configured chain of CharFilters twice.

      While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for LUCENE-3721 unnecessary, and the combination of the fixes results in the repeated application of the CharFilters.

      I came across this with a test case involving an HTMLStripCharFilter, where the input string contains "&lt;h1>". After passing through one HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the second filter.

      Attachments

        1. LUCENE-4185.patch
          10 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            msfroh Michael Froh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: