Is there a reason why the filter replace text tags like <b> or <i> with space?
I see that in the past it wasn't like this (from the code):
//return whitespace from
It make the life a lot harder when I have for example this text:
Some t<b>ex</b>t here
and I want to find "text"
Bulk close for Solr 1.4
I'm seeing a bug related to this patch going in. It's been hard
to track down and I'm dealing with a JVM bug at the same time,
so I haven't had time to write a test case yet.
In summary, I reverted to the previous classes and the indexing
goes back to normal.
Committed revision 802263.
Koji, what is the advantage of the HTMLStripCharFilter over HTMLStripReader?
Good question, Shalin
Because after LUCENE-1466 committed, all tokenizers can read chars from CharFilter rather than Reader, I'd like to replace Readers like this by CharFilters. Obvious advantages are:
I'll commit in a few days if there is no objections.