Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2016

XML 1.0 5th edition support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.2.0
    • Non-Validating Parser
    • None
    • All

    Description

      Xerces-C currently applies XML 1.0 4th edition rules to name characters
      in XML 1.0 documents. XML 1.0 5th edition permits a broader class
      of name characters, based on those permitted in XML 1.1.

      Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
      5th edition.

      Although our main work is with icXML, we've looked at making this change
      in Xerces-C original code base so that icXML support for XML 1.0 5e is
      compatible with us.

      I'm not entirely sure that I've handled everything, but the following change
      works in our test. The change plan is below and a svn diff file is
      attached.

      Here is the change plan.
      ----------------------------------

      (1) internal/CharTypeTables.hpp

      Rename gFirstNameChars1_1 to be gFirstNameChars
      Rename gNameChars1_1 to be gNameChars

      (2) util/XMLChar.cpp
      (2a)
      Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
      Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
      to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
      //
      // Name characters are special. A name is made up of a number of
      // different tables and some special case characters.
      //
      initOneTable(gNameChars, gNameCharMask);

      //
      // Name characters are special. A name is made up of a number of
      // different tables and some special case characters.
      //
      initOneTable(gNameChars, gNCNameCharMask);
      gTmpCharTable[chColon] &= ~gNCNameCharMask;

      //
      // Then do the first name char
      //
      initOneTable(gFirstNameChars, gFirstNameCharMask);

      (2b) #define NEED_TO_GEN_TABLE
      compile and do a sample run of a Xerces app, generate table.out

      (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
      with that from table.out.

      (3) XMLChar.hpp
      Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
      XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
      to each check for and allow characters in the #x10000-#xEFFFF range

      else

      { if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F)) if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF)) return true; }

      (4) Modify XMLReader::getName and XMLReader::getNCName
      to allow surrogate pairs in Names and NCNames
      (i.e., use the version 1.1 logic for both 1.0 and 1.1).

      Attachments

        1. diff5e
          277 kB
          Rob Cameron

        Activity

          People

            amassari Alberto Massari
            cameron@cs.sfu.ca Rob Cameron
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: