Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.1.4
-
None
-
None
-
None
-
Linux CentOS-7 (64bit), Windows 7 (64bit)
Description
Apologies if this is a known issue, but I have not found it by conventional
means (i.e., google an searching through the bug data base here).
I found that the serialisation/deserialisation (here: of grammars) is not as portable as it (IMHO) should be.
The problem happens in XSerializeEngine::readString() when
the length of the string is taken from the associated BinInputStream as
"unsigned long":
/***
- Check if any data written
***/
unsigned long tmp;
*this>>tmp;
On a Windows7 x64, MSVS2012, this will take 4 byte off the head of the stream,
but on a CentOS 7 x64 (g++ 4.8.3), this will take 8 byte.
As a consequence, a BinInputStream carefully encoded on Windows (e.g. putting
it into a char array with
examples/cxx/tree/embedded/grammar-input-stream.cxx
which is a common xsd example)
will fail when "reading" it on the Linux box, because everything from the first
string on is garbage.
Moreover, this will (probably) give no meaningful error message, just a
"XSerialisationException" thrown, cause at some point it will (probably)
misinterpret wchar data as length information and try to read the next string
that is millions of bytes long (according to the misunderstood BinInputStream).
The BinInputStream will then run out of bytes.
A similar issue is present concerning the alignment of the data according to data type that happens for all >> operations: this is (necessarily) very
platform dependent.
It would be a big improvement, if xerces would encode the (de)serialization
in a platform/compiler independent manner. The purpose after all IS to be portable, right?
E.g., the serialisation engine could always use integers of known byte width
(e.g.: #include <inttypes.h> -> use uint32_t) instead of "unsigned long".
ALso, the alignment issue should be addressed; it is hard to predict
what restrictions apply for the used compiler (or even processor) here, some are not capable to read an integer from a memory address that is not 4-byte aligned.
E.g., the data could be copied (to a properly aligned item initialized by 0s)
before doing the cast to an integer type.
In any case, it should always be platform-independent how many bytes are next to be read from the BinaryInputStream.
(Of course, the write operations have to follow the same business logic.)
Attachments
Issue Links
- is duplicated by
-
XERCESC-1959 serializeGrammars does not work between 32 and 64 bit systems
- Open