For my work, I retreive a large amount of data as an XML String and I use the DocumentBuilder to parse a ByteArrayInputStream containing this XML. The XML contains many CDATA sections and occasionally, depending upon the data, the document tree will have nodes that contain incorrect data. I have found that if I put a crimson.jar in front of the xercesImpl.jar in the classpath, then the document tree comes out OK, but not if xercesImpl.jar is in front of the crimson.jar. Since we use such a large string of XML data, trying to have you reproduce it may be somewhat difficult. I was able to make a small program that does produce these incorrect results. import org.w3c.dom.*; import javax.xml.parsers.*; import java.io.*; class xmltest{ public static void main(String args[]){ StringBuffer xml = new StringBuffer(); xml.append("<LETTERS>"); for (int y=0;y<=100;y++){ xml.append("<LETTER><![CDATA["); for (int z=0;z<=y;z++) xml.append((char)((y%26)+97)); xml.append("]]></LETTER>"); } xml.append("</LETTERS>"); byte[] b = xml.toString().getBytes(); InputStream is = new ByteArrayInputStream(b); Document doc = null; try { if (is!=null){ DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); doc = docBuilder.parse(is); } } catch (Exception e){} NodeList nodelist = doc.getDocumentElement().getChildNodes(); for (int idx=0; idx<nodelist.getLength();idx++){ Node node = nodelist.item(idx); System.out.println(node.getFirstChild().getNodeValue()); } } } At least in my testing, when the nodelist gets to the 65th item, the result for the node value is incorrect. Instead of the node containing the same letter, it is like a concatination of many of the other nodes. Thanks, Matt Havlovick Consolidated Freightways
If changing the parser makes the problem go away, this may be a parser bug rather than a Xalan bug. Have you tried running your documents through the Xerces sample programs, to see whether they're parsing correctly?
Yes, it's seems to be a parser bug. The xercesImpl.jar file appears to have the problem, and because it is packaged with the xalan download, I thought it might go here as a xalan bug?
Nothing wrong with posting it as a Xalan bug as a first guess, but if it's clear that it's a Xerces malfunction posting it there instead is the only way to get it fixed. Transferring to the Xerces project.