Issue Details (XML | Word | Printable)

Key: JELLY-28
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: dion gillard
Reporter: Incze Lajos
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Jelly

Bad entity processing

Created: 31/Jan/03 03:44 AM   Updated: 27/Aug/04 04:39 AM
Return to search
Component/s: core / taglib.core
Affects Version/s: None
Fix Version/s: 1.0-beta-4

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works lexical-patches.zip 2004-08-27 12:49 AM Hans Gilde 2 kB
Environment: No special environment.

Resolution Date: 27/Aug/04 04:39 AM


 Description  « Hide
Have a file, name it a.xml with this content:

-----------------------------
<?xml version="1.0"?>
<!DOCTYPE a [
<!ENTITY x "y">
]>
<a>&x;</a>
-----------------------------

Run the below simple (maven) jelly script:

-----------------------------
<project default="java:jar"
xmlns:j="jelly:core"
xmlns:x="jelly:xml">

<goal name="emnl:test">
<x:parse var="doc" xml="a.xml"/>
<echo><x:copyOf select="$doc"/></echo>
</goal>

</project>
-----------------------------

The result will be this:

-----------------------------
....
emnl:test:
[echo] <?xml version="1.0" encoding="UTF-8"?>
<a>&x;y</a>
BUILD SUCCESSFUL
-----------------------------

I'm aware of the fact that the bug originally comes from dom4j.
The below dom4j program fragment

-----------------------------
....
SAXReader xmlReader = new SAXReader();
Document doc = xmlReader.read("a.xml");
XMLWriter writer = new XMLWriter(System.out);
writer.write(doc);
writer.flush();
....
-----------------------------

will output this result:

-----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE a><a>&x;</a>
-----------------------------

which is bad (not even well-formed). I've filed this issue
at the dom4j bugtracker
(http://sourceforge.net/tracker/?group_id=16035&atid=116035)
under the number 676427, with some notes one the possible
resolution.

But as we can see, the jelly xml tag adds a twist to the dom4j bug,
it inserts both the entity and the entity value into the tag.

Thanks, incze



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Morgan Delagrange added a comment - 09/Sep/03 05:49 PM
This seems like an extremely critical bug.

Paul Libbrecht added a comment - 05/Feb/04 10:22 PM
Allow me to mention that is so severe that Maven currently prefers to stay
with dom4j-1.4-beta-8 instead of dom4-1.4 release.

The bug isdefinitely floating somehwere around dom4j and we would like to have some help of persons that are dom4j-aware.

thanks.


dion gillard added a comment - 12/Aug/04 06:45 AM
There is a test case in jelly-tags/xml/test/org/apache/commons/jelly/tags/xml/suite.jelly

Hans Gilde added a comment - 20/Aug/04 03:57 AM
The test case reports <test:assertEquals> expected:[y] but was:[&x;].

At least it isn't &x;y.

Given that this is basically a dom4j problem... is it really a blocker for Jelly 1.0b4?


dion gillard added a comment - 20/Aug/04 04:24 AM
For me it's not a blocker, but a known beta issue.

Maarten Coene added a comment - 22/Aug/04 06:06 PM
Hi,

I couldn't replicate this issue with dom4j. If I parse this xml above and print it out, it seems ok to me:

String xml = "<!DOCTYPE a [\n" +
"<!ENTITY x \"y\">\n" +
"]>\n" +
"<a>&x;</a>";
Document doc = DocumentHelper.parseText(xml);
System.out.println(doc.asXML());

this gives the following output:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE a><a>y</a>

could someone give me some hints about how to create a dom4j junit test that illustrates this problem?

thanks,
Maarten


dion gillard added a comment - 22/Aug/04 11:23 PM
Did you try the original DOM4J code above?

e.g.

SAXReader xmlReader = new SAXReader();
Document doc = xmlReader.read("a.xml");
XMLWriter writer = new XMLWriter(System.out);
writer.write(doc);
writer.flush();


Maarten Coene added a comment - 23/Aug/04 03:43 PM
Trying the original dom4j code gives as output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE a><a>y</a>

seems ok to me
Maarten


Hans Gilde added a comment - 24/Aug/04 03:45 AM
The problem seems to be in the string() XPath function. I don't know if it's a problem with the way it's used or with dom4j.

Here's the code that replicates what we're doing in Jelly:

SAXReader xmlReader = new SAXReader(true);
Document doc = xmlReader.read("src/dom4jtest/a.xml");
Object obj = doc.selectObject("string(/a)");
System.out.println(obj);
XMLWriter writer = new XMLWriter(System.out);
writer.write(doc);
writer.flush();

This prints "&x;" .... and maybe it's supposed to, I couldn't find a good definition of string() applied to a node set in my XPath book.

FYI to Jelly people, the condition that initially opened this TR isn't broken any more. Outputting that code works fine.


Hans Gilde added a comment - 24/Aug/04 01:13 PM
Ok, more info here:

First, in my previous post, replace "src/dom4jtest/a.xml" with "entity.xml", the XML file that's in the unit test.

Second, it seems that we have two distinch issues here:

1) The unit test that fails, which is related to the string() XSLT function.

2) The original post, which does, in fact print bad output. I think that the problem is with XML CopyOfTag line 52 "...new SAXWriter(output, output)". It's using the output as both a ContentHandler and a LexicalHandler. The &x; is being output in the startEntity method of LexicalHandler. The SAX description for this method says "General entities are reported with their regular names". So, I think that we want only the ContentHandler, which would change line 52 to "...new SAXWriter(output)".

Here's the Java source that replicates the problem:

public class TestDom4jEntity extends TestCase {
public void testDom4JEntityParsing() throws Exception {
SAXReader xmlReader = new SAXReader(true);
Document doc = xmlReader.read("src/dom4jtest/a.xml");

Object obj = doc.selectObject("string(/a)");
System.out.println("Bad: " + obj);

XMLWriter writer = new XMLWriter(System.out);
//broken
SAXWriter saxWriter = new SAXWriter(writer, writer);
//fixed
//SAXWriter saxWriter = new SAXWriter(writer);
List nodes = doc.selectNodes("/a");
for (Iterator iter = nodes.iterator(); iter.hasNext() {
Object object = iter.next();
if (object instanceof Node) { saxWriter.write((Node)object); } else if (object != null) { System.out.println(object.toString()); }
}
}
}


dion gillard added a comment - 24/Aug/04 01:39 PM
We need the lexical handler to allow us access to stuff like
select="$doc/foo/comment()"

Hans Gilde added a comment - 24/Aug/04 11:19 PM
So, when you transform your XML document, it's supposed to strip out the original comments. If you want comments in your output, you have to put them in CDATA in the original.

How about this:

By default, CopyOf won't output lexical data. If you want to access the comments in this mode, you have to use "string($doc/a/comment())"

We add in an attribute "lexical" that will cause CopyOf to output lexical information along with the content. In this mode, you should expect the "double entity" issue and any other issue related to outputting lexical data.

It's a little inconvenient for the user but the underlying SAX/dom4j classes work this way.

I'll submit the patch tomorrow unless you hate it.


dion gillard added a comment - 24/Aug/04 11:49 PM
This sounds doable.

So, by default, the select=".../comment()" wont work unless lexical is set to true, if I'm reading it right.


Hans Gilde added a comment - 27/Aug/04 12:49 AM
Patches to XML CopyTag and CopyOfTag to add the "lexical" attribute. The CopyOfTag patch also replaces the depricated dom4j selectObject function with evaluate. Also a patch to XML test suite.jelly to test the changes.

Hans Gilde added a comment - 27/Aug/04 12:52 AM
select=".../comment()" won't work when lexical="false". Specifically, the select will return the comments to the tag but they won't be output.

dion gillard added a comment - 27/Aug/04 04:39 AM
Applied successfully