Demoting this issue and moving to 1.1 - current patch is not suitable due to LGPL licensed parts.
RTF parser doesn't compile anymore
This parser correctly handles non ascii input
I think that the patch contains some lgpl code that we cannot commit into apache repository.
Yes, it looks a bit like a problem... How can we handle this?
I think we should start looking at Apache Tika for most (or all) of our parsers.
RTF parsing is now handled by the TikaPlugin (NUTCH-766). Please open an issue on Tika if the original problem with non-ascii chars still occurs