Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1288

Wrong line/column number in UTFDataFormatException



    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.5.0, 2.6.0
    • None
    • Non-Validating Parser
    • None
    • Linux (SUSE 9.1, Fedora core 2, Redhat 9) on Intel, Solaris 7 on SPARC, various gcc versions.


      I've the following (bad) XML file:
      --------------- bad.xml ----------------------------
      <?xml version="1.0" encoding="UTF-8"?>
      <field>Blah blah</field>
      <field>Blah blah ò blah blah</field>
      <field>Blah blah</field>
      (note the accented 'o' in the 2nd "field" line - hope it won't be
      The file is bad because the accented 'o' is represented with a single
      byte, 0xf2. This is the hed dump:

      3e 42 6c 61 68 20 62 6c 61 68 20 f2 20 62 6c 61 |>Blah blah . bla|

      Problem is, when I run "SAXPrint bad.xml" i get the following error:
      Fatal Error at file /users/valerio/tmp/bad.xml, line 1, char 39
      Message: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 ( ) of a 4-byte sequence.

      The row and column reported by SAXParseException::getColumnNumber()
      and SAXParseException::getLineNumber() are wrong. I seem to recall
      this was not the case with older (2.0 or 2.2?) versions of Xerces-C,
      but I'm not sure.

      I noticed the issue with 2.5, then tried with 2.6 but there was
      no apparent difference. Can somebody take care of this? We often
      have big XML files to parse, and not knowing where the error
      really is is a real pain.


        1. xercesc_wrong_position.cpp
          1 kB
          Igor Ignatyuk
        2. xercesc_wrong_position.xml
          0.0 kB
          Igor Ignatyuk



            Unassigned Unassigned
            gionco Valerio Gionco
            1 Vote for this issue
            1 Start watching this issue