Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2500

Terrible performance from ToStream.startPrefixMapping calling flush() repeatedly while serializing XML

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: Serialization
    • Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
    • Labels:
      None
    • Environment:
      N/A (Any)

      Description

      As discussed in XALANJ-78, flush() is only to be called from endDocument(). However, the .startPrefixMapping method being called in ToStream is always calling "flushPending()", which among other things, calls "m_writer.flush()".

      Here is some relevant stack trace, along with fully-qualified class names:

      org.apache.xml.serializer.WriterToUTF8Buffered.flush(WriterToUTF8Buffered.java:467)
      at org.apache.xml.serializer.ToStream.flushPending(ToStream.java:2975)
      at org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2340)
      at org.apache.xml.serializer.ToStream.startPrefixMapping(ToStream.java:2299)
      at org.apache.xalan.transformer.TransformerIdentityImpl.startPrefixMapping(TransformerIdentityImpl.java:985)
      at org.apache.xml.serializer.TreeWalker.startNode(TreeWalker.java:317)
      at org.apache.xml.serializer.TreeWalker.traverse(TreeWalker.java:145)
      at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:390)
      at ...

      Note that it seems that some use of XML namespaces is required for this to be an issue. However, this does not necessarily mean that there are XML namespaces in the output document. Where I first ran into this is with an XSL that utilizes XML namespaces for parameter names, but the generated document is completely within the default namespace.

      Below is a sample test-case that demonstrates the issue, in which flush() is called 103 times. 1 time for each element serialized containing an XML namespace, and 3 times for the end of the document: When writing to high-latency outputs e.g. a remote web client, the result is a severe performance issue.

      import java.io.IOException;
      import java.io.OutputStream;

      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.DocumentBuilderFactory;
      import javax.xml.transform.Transformer;
      import javax.xml.transform.TransformerFactory;
      import javax.xml.transform.dom.DOMSource;
      import javax.xml.transform.stream.StreamResult;

      import org.w3c.dom.Document;
      import org.w3c.dom.Element;

      public class JavaTest{
      public static void main(String[] args) throws Exception{
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      DocumentBuilder db = dbf.newDocumentBuilder();
      Document doc = db.newDocument();

      Element root = doc.createElement("Root");
      doc.appendChild(root);
      for(int i=0; i<100; i++)

      { Element child = doc.createElementNS("http://test.example.com", "Child" + i); root.appendChild(child); }

      TransformerFactory tf = TransformerFactory.newInstance();
      Transformer t = tf.newTransformer();

      OutputStream os = new OutputStream(){
      protected int flushCount = 0;

      @Override
      public void write(int b) throws IOException

      { // Do nothing - this is just a minimal test case. }

      @Override
      public void flush() throws IOException

      { new Throwable("flushed #" + (++flushCount)).printStackTrace(); }

      };

      t.transform(new DOMSource(doc), new StreamResult(os));
      }
      }

      Using a Writer instead of an OutputStream results in the same issue, where flush() is called repeatedly on the Writer instead.

      The only known work-around is to write and use an overridden implementation of the OutputStream or Writer where flush() is effectively caught and ignored.

      See also my blog posting at http://blogger.ziesemer.com/2009/05/xalan-j-flushing-serialization.html .

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ziesemer Mark A. Ziesemer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: