Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1828

LexicalHandler startEntity/endEntity events not paired and have incorrect arguments

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 3.1.0
    • SAX/SAX2
    • None
    • OS/X, Win32

    Description

      It appears that the LexicalHandler events startEntity and endEntity are not sent correctly when parsing a document with a DTD that itself references external entities.

      (Note: I will attach sample XML, repro code, and the full output of the code. The following is a summary.)

      For example, I have been parsing a valid XHTML document. The strict XHTML DTD includes 4 other files with entity declarations. I see the following events on my LexicalHandler (ignoring elements, characters, whitespace, external entity declarations, and comments):

      startDocument
      ...
      startDTD: html, -//W3C//DTD XHTML 1.0 Strict//EN, http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
      ...
      startEntity: [dtd]
      ...
      startEntity: [dtd]
      ...
      startEntity: [dtd]
      ...
      startEntity: [dtd]
      ...
      endEntity: [dtd]
      ...
      endDTD
      ...
      endDocument

      I expected something more like the following (as generated by the standard SAX parser in Java 6):

      startDocument
      startDTD: 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'
      startEntity: '[dtd]'
      startEntity: '%HTMLlat1'
      endEntity: '%HTMLlat1'
      startEntity: '%HTMLsymbol'
      endEntity: '%HTMLsymbol'
      startEntity: '%HTMLspecial'
      endEntity: '%HTMLspecial'
      startEntity: '%head.misc'
      endEntity: '%head.misc'
      startEntity: '%head.misc'
      endEntity: '%head.misc'
      startEntity: '%head.misc'
      endEntity: '%head.misc'
      startEntity: '%head.misc'
      endEntity: '%head.misc'
      startEntity: '%head.misc'
      endEntity: '%head.misc'
      startEntity: '%block'
      endEntity: '%block'
      startEntity: '%inline'
      endEntity: '%inline'
      startEntity: '%misc'
      endEntity: '%misc'
      startEntity: '%block'
      endEntity: '%block'
      startEntity: '%misc'
      endEntity: '%misc'
      startEntity: '%block'
      endEntity: '%block'
      startEntity: '%inline'
      endEntity: '%inline'
      startEntity: '%misc'
      endEntity: '%misc'
      endEntity: '[dtd]'
      endDTD
      startPrefixMapping: '', 'http://www.w3.org/1999/xhtml'
      endPrefixMapping: ''
      endDocument

      At a minimum, the mismatch of startEntity/endEntity events appears to be caused by the following code from DTDScanner::scanExtSubsetDecl (notice that the conditions are not the same):

      if (fDocTypeHandler && !inIncludeSect)
      fDocTypeHandler->startExtSubset();

      ...
      ...
      ...

      if (fDocTypeHandler && isDTD)
      fDocTypeHandler->endExtSubset();

      Attachments

        1. java.output
          130 kB
          Erik Wright
        2. SAX2EventsSample.tgz
          9 kB
          Erik Wright
        3. Test.java
          6 kB
          Erik Wright
        4. test.output
          136 kB
          Erik Wright
        5. test.xml
          0.3 kB
          Erik Wright

        Activity

          People

            amassari Alberto Massari
            erik@wrighttechnologysolutions.com Erik Wright
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: