IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit clearer on the usage of these fields:
"The method attribute on xsl:output identifies the overall method that should be used for outputting the result tree. The value must be a QName. If the QName does not have a prefix, then it identifies a method specified in this document and must be one of xml, html or text."
"encoding specifies the preferred character encoding that the XSLT processor should use to encode sequences of characters as sequences of bytes; the value of the attribute should be treated case-insensitively; the value must contain only characters in the range #x21 to #x7E (i.e. printable ASCII characters); the value should either be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2278] or start with X-"
"media-type specifies the media type (MIME content type) of the data that results from outputting the result tree; the charset parameter should not be specified explicitly; instead, when the top-level media type is text, a charset parameter should be added according to the character encoding actually used by the output method"
If I understand this correctly, this means the correct output specification is <xsl:output method="xml" encoding="utf-8" />, and <xsl:output media-type="text/xml; charset=UTF-8"/> should never be used.
My suggestion would be to change XSLTResponseWriter.getContentType() in such a way that (in pseudocode):
if encoding is null
.. encoding = "utf-8"
if media-type is not null
.. /* next if is for compatibility with the workaround only */
.. if media-type contains "charset='
.... return media-type
.... return media-type + "; charset=\"" + encoding
.. end if
.. if method is "html" or the first element in the final output is <html>
.... media-type = "text/html"
.. elseif method is "text"
.... media-type = "text/plain"
.. else /* it must be xml */
.... media-type = "text/xml"
.. end if
.. return media-type + "; charset=\"" + encoding