[XERCESC-1955] Xerces is poping up exception while parsing a Unicode file, but same is working fine for an ANSI file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Cannot Reproduce
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: DOM
Labels:
None
Environment:
Windows XP 32Bit
Windows7 64bit

Description

Hi All,

Please let me know, if anybody can provide some clue on this.

I have been using Xerces as XML parser in my C++ application and I have recently migrated my Xerces version from 1.3 (very old) to 3.1.

After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource & source=

{...}) and passing a Unicode file as input, it pops up exception. However the same works ok for ANSI.

The call stack is as shown below.

xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource & src={...}

) Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource & source=

{...}

) Line 549
EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - <b>My application code</b>

In the code, it is reaching at
else
{
emitError(XMLErrs::InvalidDocumentStructure);
...
}

The function at parse fail is as shown below:

void XMLScanner::scanProlog()
{
bool sawDocTypeDecl = false;
// Get a buffer for whitespace processing
XMLBufBid bbCData(&fBufMgr);

// Loop through the prolog. If there is no content, this could go all
// the way to the end of the file.
try
{
while (true)
{
const XMLCh nextCh = fReaderMgr.peekNextChar();

if (nextCh == chOpenAngle)
{
// Ok, it could be the xml decl, a comment, the doc type line,
// or the start of the root element.
if (checkXMLDecl(true))
{
// There shall be at lease -~~ONE~~- space in between
// the tag '<?xml' and the VersionInfo.
//
// If we are not at line 1, col 6, then the decl was not
// the first text, so its invalid.
const XMLReader* curReader = fReaderMgr.getCurrentReader();
if ((curReader->getLineNumber() != 1)

(curReader->getColumnNumber() != 7)) { emitError(XMLErrs::XMLDeclMustBeFirst); }

scanXMLDecl(Decl_XML);
}
else if (fReaderMgr.skippedString(XMLUni::fgPIString))

{ scanPI(); }

else if (fReaderMgr.skippedString(XMLUni::fgCommentString))

{ scanComment(); }

else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
{
if (sawDocTypeDecl)

{ emitError(XMLErrs::DuplicateDocTypeDecl); }

scanDocTypeDecl();
sawDocTypeDecl = true;

// if reusing grammar, this has been validated already in first scan
// skip for performance
if (fValidate && fGrammar && !fGrammar->getValidated())

{ // validate the DTD scan so far fValidator->preContentValidation(fUseCachedGrammar, true); }

}
else

{ // Assume its the start of the root element return; }

}
else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
{
// If we have a document handler then gather up the
// whitespace and call back. Otherwise just skip over spaces.
if (fDocHandler)

{ fReaderMgr.getSpaces(bbCData.getBuffer()); fDocHandler->ignorableWhitespace ( bbCData.getRawBuffer() , bbCData.getLen() , false ); }

else

{ fReaderMgr.skipPastSpaces(); }

}
else

{ emitError(XMLErrs::InvalidDocumentStructure); // Watch for end of file and break out if (!nextCh) break; else fReaderMgr.skipPastChar(chCloseAngle); }

}
}
catch(const EndOfEntityException&)

{ // We should never get an end of entity here. They should only // occur within the doc type scanning method, and not leak out to // here. emitError ( XMLErrs::UnexpectedEOE , "in prolog" ); }

}

It is working fine when I move back to version 1.3, but due to various other requirements, I have to use the new version 3.1 in my application.

Thanks in advance,
Jojo

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MyXML.xml
25/Jan/11 06:11
43 kB
Jojo Jose

Activity

People

Assignee:: Unassigned

Reporter:: Jojo Jose

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 21/Jan/11 09:22

Updated:: 25/Jan/11 06:11

Resolved:: 21/Jan/11 09:43

Time Tracking

Estimated:

20h

Remaining:

20h

Logged:

Not Specified