Canonicalization failed with some latin2 characters 'čćžšđČĆŽŠĐ'(leters with caron, ... ). Release 1.3.0 don't have such problem. Code which demonstrates bug: // parse document DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setNamespaceAware(true); DocumentBuilder db = dbf.newDocumentBuilder(); // text contains some latin2 characters 'čćžšđČĆŽŠĐ' String text = new String("<text>\u010D\u0107\u017E\u0161\u0111\u010C\u0106\u017D\u0160\u0110</text>"); Document doc = db.parse(new ByteArrayInputStream(text.getBytes("UTF-8"))); Element e_latin2 = doc.getDocumentElement(); Canonicalizer20010315WithComments c14 = new Canonicalizer20010315WithComments(); byte[] canon_bin = c14.engineCanonicalizeSubTree(e_latin2); if (Arrays.equals(text.getBytes("UTF-8"), canon_bin)) System.out.println("OK"); else System.out.println("Failed");
I can't reproduce this with the latest sources. Probably a dup of 41462. *** This bug has been marked as a duplicate of 41462 ***
Closing old bugs.