I'll commit in a few days if there is no objections.
Koji, what is the advantage of the HTMLStripCharFilter over HTMLStripReader?
Good question, Shalin
Because after LUCENE-1466 committed, all tokenizers can read chars from CharFilter rather than Reader, I'd like to replace Readers like this by CharFilters. Obvious advantages are:
Committed revision 802263.
I'm seeing a bug related to this patch going in. It's been hard
to track down and I'm dealing with a JVM bug at the same time,
so I haven't had time to write a test case yet.
In summary, I reverted to the previous classes and the indexing
goes back to normal.
Bulk close for Solr 1.4
Is there a reason why the filter replace text tags like <b> or <i> with space?
I see that in the past it wasn't like this (from the code):
//return whitespace from
It make the life a lot harder when I have for example this text:
Some t<b>ex</b>t here
and I want to find "text"