Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.8.0
-
None
Description
I parse my XML document using the Xerces DOMParser. The internal subset exists
perfectly intact in the resulting DOM until I call Document.cloneNode(true). When I
perform a print of the nodes, here's what the document type looks
like, first before the clone (expected) and then after (actual)....
Expected....
DocumentTypeImpl: name=document
internalSubset=
<!ENTITY erh "Elliotte Rusty Harold">
<!ELEMENT document (title, signature)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT hr EMPTY>
<!ELEMENT lastmodified (#PCDATA)>
<!ELEMENT signature (hr, copyright, email, lastmodified)>
Actual....
DocumentTypeImpl: name=document
EntityImpl: name=erh
TextImpl: Elliotte Rusty Harold
As you can see, Document.cloneNode(true) seems to turn the internal
subset <!ENTITY> into an actual Entity Node and the rest of the
internal subset (the <!ELEMENT>'s) is discarded. This makes the document invalid
since there is no DTD information where there was in the original document.
I applied a small patch to CoreDocumentImpl (attached) and now it works as expected, other than the fact that in addition to the internal subset existing, the Entity node exists as a child of the DocumentType, which is odd. I'm not sure if that's valid or not, though it didn't exist in the DOM before Document.cloneNode(true), so it seems to me it shouldn't be there. However, if it doesn't hurt anything, I guess it doesn't matter much to me. Anyway, after my patch, here's the new result...
DocumentTypeImpl: name=document
internalSubset=
<!ENTITY erh 'Elliotte Rusty Harold'>
<!ELEMENT document (title,signature)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT hr EMPTY>
<!ELEMENT lastmodified (#PCDATA)>
<!ELEMENT signature (hr,copyright,email,lastmodified)>
EntityImpl: name=erh
TextImpl: Elliotte Rusty Harold
I hope this can get applied in time for the next release of Xerces!
Jake