Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1385

UTF8 parse failure when there's a bom in the utf8 header

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • SAX/SAX2
    • None
    • OSX, CodeWarrior 9.4

    Description

      This issue probably related to 1284. (Or a duplicate of it)

      The attached sample code failes with Xerces2.6.

      The problem seems to be that there's double checing for the utf8 bom. Bellow is a patch to XMLParser.cpp that resolves this issue. [ The bug is that we've already detected utf8 bom and modified fRawBufIndex, but the seconds check doesn't takes it into accout. ]
      src/xercesc/internal/XMLReader.cpp

      @@ -544,7 +544,7 @@
      }
      // If there's a utf-8 BOM (0xEF 0xBB 0xBF), skip past it.
      else {

      • const char* asChars = (const char*)fRawByteBuf;
        + const char* asChars = (const char*)(fRawByteBuf + fRawBufIndex);
        if ((fRawBytesAvail > XMLRecognizer::fgUTF8BOMLen )&&
        (XMLString::compareNString( asChars
        , XMLRecognizer::fgUTF8BOM

      It's also possible that we should check if we detected an utf8 bom already as the following code would probably allow a double utf8 bom.

      Attachments

        1. test.cpp
          2 kB
          Miklos Fazekas

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mfazekas Miklos Fazekas
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment