Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1961

Invalid IGXMLScanner::fDTDGrammar, causing segfault

    XMLWordPrintableJSON

Details

    Description

      The problem occurs while OpenVXI is initialising, when it parses a couple of (hard-coded) DTDs and then a (hard-coded) XSD. During the DTD parsing (SAX2XMLReader::parse()), a DTDGrammar is created and stored in two places: GrammarResolver::fGrammarBucket and IGXMLScanner::fDTDGrammar. At the start of the XSG parsing (SAX2XMLReader::loadGrammar()), the GrammarBucket is cleared, deleting the DTDGrammar but leaving IGXMLScanner::fDTDGrammar still pointing to it. During the parsing, IGXMLScanner::getEntityDeclPool() is called and hence tries to call fDTDGrammar->getEntityDeclPool(). This sometimes causes a segfault (though usually only after our app - performing these operations over and over - has been running for a few hours).

      I have some code which reproduces the problem - I'll attach it to this case as soon as I can work out how. Since the code rarely segfaults, I've been demonstrating it by adding printf()s to the DTDGrammar constructor/destructor and IGXMLScanner::getEntityDeclPool(). So my test code currently generates this:

      [peter@ultra1 xerces_bug]$ ./test
      DTDGrammar::DTDGrammar() this = 0x95691508
      Warning in file vxml 1.0 defaults at line 2 column 51
      Reason: Element 'metadata' was referenced in a content model but never declared
      DTDGrammar::~DTDGrammar() this = 0x95691508
      DTDGrammar::DTDGrammar() this = 0x956fa908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::DTDGrammar() this = 0x95ac4908
      IGXMLScanner::getEntityDeclPool() fDTDGrammar = 0x95691508
      DTDGrammar::~DTDGrammar() this = 0x95ac4908
      DTDGrammar::~DTDGrammar() this = 0x956fa908
      [peter@ultra1 xerces_bug]$

      showing the DTDGrammar this=0x95691508 being created, deleted and then used by IGXMLScanner.

      Our fix is to set fDTDGrammar to 0 after the bucket-clearing operation

      fGrammarResolver->useCachedGrammarInParse(toCache);

      at the start of IGXMLScanner::loadGrammar(), and this solves our problem.

      We've reproduced the problem in v2.6.0 and v2.7.0, but v3.1.1 doesn't call IGXMLScanner::getEntityDeclPool() in our test code. However, tracing it in gdb I can see that v3.1.1 does potentially have the same problem, i.e. IGXMLScanner::fDTDGrammar is pointing to a deleted DTDGrammar after IGXMLScanner::loadGrammar() has cleared the cache.

      Attachments

        1. 2011-04-05_xerces_bug.tar.gz
          37 kB
          Peter Burns

        Activity

          People

            Unassigned Unassigned
            csoft Peter Burns
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: