Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Debugging one of my test cases, I found that a TokenStream from an Analyzer constructed by Solr contains the configured chain of CharFilters twice.

      While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for LUCENE-3721 unnecessary, and the combination of the fixes results in the repeated application of the CharFilters.

      I came across this with a test case involving an HTMLStripCharFilter, where the input string contains "&lt;h1>". After passing through one HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the second filter.

        Activity

        Hide
        Steve Rowe added a comment -

        (edited description to escape the ampersand in "&lt;h1>" so that JIRA readers understand the problem)

        Show
        Steve Rowe added a comment - (edited description to escape the ampersand in "&lt;h1>" so that JIRA readers understand the problem)
        Hide
        Robert Muir added a comment -

        Thanks for reporting this: you are right, TokenizerChain has a bug where it wraps the already-wrapped reader.

        Here's a patch.

        Show
        Robert Muir added a comment - Thanks for reporting this: you are right, TokenizerChain has a bug where it wraps the already-wrapped reader. Here's a patch.
        Hide
        Hoss Man added a comment -

        hoss20120711-manual-post-40alpha-change

        Show
        Hoss Man added a comment - hoss20120711-manual-post-40alpha-change

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Froh
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development