we know where the problem in Jetty is (they buffer 512 chars without respecting surrogates). When they then convert those buffered chars to UTF-8 its broken at the boundaries. This bug in Jetty may also affect JSON output, but JSON is much more compact and may not easily hit this buffer issue (as it does not use Strings to feed to writer, the broken method in JETTY is handling Writer.write(String,...).
In general we are discussing to not use Readers and Writers supplied by the Servlet Container. As HTTP is a byte-based protocol, code should only use InputStreams and OutputStreams to communicate with the client. Writers and Readers are only provided for convenience with JSP engines.
The input part of Solr no longer uses Readers, they pass always pass InputStreams around. I uploaded a patch a week ago to do the same on the output side of Solr: SOLR-ServletOutputWriter.patch
Please note: As JSP pages use Jetty's writers, analysis.jsp may still produce corrupt output.
Can you patch your solr with that one, then your problems should disappear for all OutputHandler generated content except JSP pages in Solr. We are thinking about optimizing this, internally, but the above patch removes all use of Solr. The patch is against trunk as far as I know.