Details
-
Bug
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
1.13
-
None
-
None
Description
The SVG Transcoder checks for valid XML characters but does not take into account characters that, due to the Java String implementation, are represented by two Java chars (UTF-16 Surrogate Pairs). Since neither of those individual chars are a valid XML character on their own, the transcoder fails. But the XML1.0 specification does allow for those characters.
In org.apache.batik.dom.util.DOMUtilities#contentToString, instead of String#charAt, rather String#codePointAt should be used to extract individual characters. Using StringBuffer#appendCodePoint, the code points can properly appended to the output string. The methods that check for character validity already account for code points.
Code example to reproduce the issue:
String svgNS = SVGDOMImplementation.SVG_NAMESPACE_URI; Document doc = SVGDOMImplementation.getDOMImplementation().createDocument(svgNS, "svg", null); Element text = doc.createElementNS(svgNS, "text"); text.setTextContent("Hello, world! 👋"); doc.getDocumentElement().appendChild(text); var transcoder = new SVGTranscoder(); TranscoderOutput out = new TranscoderOutput(new OutputStreamWriter(System.out)); TranscoderInput in = new TranscoderInput(doc); transcoder.transcode(in, out);
throws
Exception in thread "main" java.lang.RuntimeException: IO:Invalid character
  at batik.transcoder@1.13/org.apache.batik.transcoder.svg2svg.SVGTranscoder.transcode(SVGTranscoder.java:179)