In certain circumstances, schema validation fails to correctly calculate the length of a string with & (and possibly other) elements in it. The following schema and XML produce the error in the Sax2Print example, although I first noticed the error when using Xerces as a validator from within Xalan-C, so it is unlikely to be a problem with this example only.
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Test.xsd">
<flibble>curiouser & curiouser&curiouser</flibble>
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:union memberTypes="TextString Null"/>
<xs:minLength value="1" />
<xs:maxLength value="31" />
<xs:length value="0" />
Error at file F:\My Documents\Mike\Visual Studio Projects\xerces-c-src_2_5_0\Build\Win32\VC7\Debug/Test.xml, line 3, char 60
Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value 'curiouser & curiouser &curiouser' does not match any member types (of the union) .
There are a few things to note:
+ As you can see by counting the letters, the input string should fit the first member of the union but an extra space has been put in before the second ampersand.
+ I have not determined the exact pattern within the string that causes this, but it seems to require two ampersands and that the second not have a space before it
+ I do not know if this is restricted to & or is general to any other type of escape sequence or a combination thereof (since more than one appears to be necessary.
+ This only happens for a union. If the schema simply provides a straight restriction on the length of the string, there is no complaint from validation.
+ Running Sax2Print with -s (i.e. no validation) prints the input document with the string processed correctly (i.e. the correct number of characters). It is only when the validator is switched on that the extra space is produced. This is also the case from XSLT operations within Xalan: the validator complains but if switched off, the string is output correctly to the correct length.
I have spent some time trying to figure out what is going on in order to produce a patch. I will continue to do so, but at the moment, I am not having much luck. If anyone else with a better understanding of the code wants to jump in and steal my thunder, I won't be at all offended.