Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2240

Junk characters (including null) allowed in XML declaration

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.3
    • None
    • Non-Validating Parser
    • None
    • Linux

    Description

      In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:

      <?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
      <root_elem>
        <child_elem some_attr="abc" />
        <child_elem some_attr="def" />
      </root_elem>

      The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:

      <?xml version="1.0" encoding="UTF-8" ?>
      <root_elem^@^@^@^@^@>
        <child_elem some_attr="abc" />
        <child_elem some_attr="def" />
      </root_elem>

      This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ColBFritz Benjamin Fritz

            Dates

              Created:
              Updated:

              Slack

                Issue deployment