Details
Description
The xml file (test.xml) below has two french letters (marked here as ',').
file test.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<Project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- d'�t� 2008 -->
</Project>
When I ran DOMPrint test.xml or SAMPrint test.xml, the fatal errors occured and here is the output:
DOMCount test.xml
Fatal Error at file /home/bdai/xfndry/HEAD/env/xerces-c-3.0.0-x86_64-linux-gcc-3.4/bin/test.xml, line 1, char 40
Message: invalid byte 't' at position 2 of a 3-byte sequence
Errors occurred, no output available
Regardless where I move the gilty line, the line number (=1)and column (=40) do not change.
Debugging through the code, I see that XMLReader.cpp keeps track of the line number and column number, but it calls XMLUTF8Transcoder.cpp where it peeks each byte. When it realizes that the byte is not an UTF8 code, it throws an exception. XMLReader never updates its line and column numbers.
If the error occurs after 4K(hex) bytes, the line will be updated to a new line, but will be unchanged inside the second 4K(hex) bytes regardless where the error is.
It would be helpful to report the real line number.