Uploaded image for project: 'Batik'
  1. Batik
  2. BATIK-1328

No support for unicode characters in U+10000 - U+10FFFF range

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • 1.13
    • None
    • SVG DOM
    • None

    Description

      The SVG Transcoder checks for valid XML characters but does not take into account characters that, due to the Java String implementation, are represented by two Java chars (UTF-16 Surrogate Pairs). Since neither of those individual chars are a valid XML character on their own, the transcoder fails. But the XML1.0 specification does allow for those characters.

      In org.apache.batik.dom.util.DOMUtilities#contentToString, instead of String#charAt, rather String#codePointAt should be used to extract individual characters. Using StringBuffer#appendCodePoint, the code points can properly appended to the output string. The methods that check for character validity already account for code points.

      Code example to reproduce the issue:

      String svgNS = SVGDOMImplementation.SVG_NAMESPACE_URI;
      Document doc = SVGDOMImplementation.getDOMImplementation().createDocument(svgNS, "svg", null);
      Element text = doc.createElementNS(svgNS, "text");
      text.setTextContent("Hello, world! 👋");
      doc.getDocumentElement().appendChild(text);
      
      var transcoder = new SVGTranscoder();
      TranscoderOutput out = new TranscoderOutput(new OutputStreamWriter(System.out));
      TranscoderInput in = new TranscoderInput(doc);
      transcoder.transcode(in, out);

      throws

      Exception in thread "main" java.lang.RuntimeException: IO:Invalid character
          at batik.transcoder@1.13/org.apache.batik.transcoder.svg2svg.SVGTranscoder.transcode(SVGTranscoder.java:179)

      Attachments

        Activity

          People

            Unassigned Unassigned
            jeeesper Jasper Krauter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified