Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.8
-
None
-
jdk 1.4.2_08, digester 1.8
Description
i need to process an xml file that contains entities: ie:
<?xml version="1.0" encoding="UTF-8"?>
<top>
<body>A A</body>
</top>
i'm using digester as follows:
Digester digester = new Digester ();
digester.addRule ("top", new ObjectCreateRule (MyContent.class));
digester.addRule ("top/body", new NodeCreateRule ());
digester.addSetNext ("top/body", "setBody");
then
...
digester.parse (file);
MyContent class transforms the node into text as follows:
public class MyContent
{
public void setBody (Element node)
...
}
the content displayed is in this case: <body>AA</body>
if the body was encoded in the xml file as: <top><body>A A</body></top>, the content would then be correctly displayed as:
<body>A A</body>
looking at the NodeCreateRule.NodeBuilder.characters () implementation, the following code generates the problem:
String str = new String(ch, start, length);
if (str.trim().length() > 0) {
top.appendChild(doc.createTextNode(str));
when entities are being used; the characters () method is called for 'A', ' ' and 'A' in the first case. in the second case, it is called once with 'A A'.