Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1181

internal subset lost after using cloneNode (patch provided!)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 2.8.1
    • DOM (Level 3 Core)
    • None

    Description

      I parse my XML document using the Xerces DOMParser. The internal subset exists
      perfectly intact in the resulting DOM until I call Document.cloneNode(true). When I
      perform a print of the nodes, here's what the document type looks
      like, first before the clone (expected) and then after (actual)....

      Expected....

      DocumentTypeImpl: name=document
      internalSubset=
      <!ENTITY erh "Elliotte Rusty Harold">
      <!ELEMENT document (title, signature)>
      <!ELEMENT title (#PCDATA)>
      <!ELEMENT copyright (#PCDATA)>
      <!ELEMENT email (#PCDATA)>
      <!ELEMENT hr EMPTY>
      <!ELEMENT lastmodified (#PCDATA)>
      <!ELEMENT signature (hr, copyright, email, lastmodified)>

      Actual....

      DocumentTypeImpl: name=document
      EntityImpl: name=erh
      TextImpl: Elliotte Rusty Harold

      As you can see, Document.cloneNode(true) seems to turn the internal
      subset <!ENTITY> into an actual Entity Node and the rest of the
      internal subset (the <!ELEMENT>'s) is discarded. This makes the document invalid
      since there is no DTD information where there was in the original document.

      I applied a small patch to CoreDocumentImpl (attached) and now it works as expected, other than the fact that in addition to the internal subset existing, the Entity node exists as a child of the DocumentType, which is odd. I'm not sure if that's valid or not, though it didn't exist in the DOM before Document.cloneNode(true), so it seems to me it shouldn't be there. However, if it doesn't hurt anything, I guess it doesn't matter much to me. Anyway, after my patch, here's the new result...

      DocumentTypeImpl: name=document
      internalSubset=
      <!ENTITY erh 'Elliotte Rusty Harold'>
      <!ELEMENT document (title,signature)>
      <!ELEMENT title (#PCDATA)>
      <!ELEMENT copyright (#PCDATA)>
      <!ELEMENT email (#PCDATA)>
      <!ELEMENT hr EMPTY>
      <!ELEMENT lastmodified (#PCDATA)>
      <!ELEMENT signature (hr,copyright,email,lastmodified)>

      EntityImpl: name=erh
      TextImpl: Elliotte Rusty Harold

      I hope this can get applied in time for the next release of Xerces!

      Jake

      Attachments

        Activity

          People

            mrglavas@ca.ibm.com Michael Glavassevich
            hoju Jacob Kjome
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: