Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
4.0-ALPHA
-
None
-
New
Description
Debugging one of my test cases, I found that a TokenStream from an Analyzer constructed by Solr contains the configured chain of CharFilters twice.
While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for LUCENE-3721 unnecessary, and the combination of the fixes results in the repeated application of the CharFilters.
I came across this with a test case involving an HTMLStripCharFilter, where the input string contains "<h1>". After passing through one HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the second filter.