Details
Description
The CMS API defines the interface for Strings in the TextMessage using the C++ std::string and const char* primitives and doesn't consider character encodings in its interface or the use of multibyte string representations.
In order to allow the use of Strings between Java and C++ and .NET clients the strings in the TextMessage as well as those in MapMessage, StreamMessage, and BytesMessage (when wreiteUTF and readUTF are called) as well as message properties of the string type are encoded in the JAVA standard Modified UTF-8 format for serialized strings. This design makes the assumption that strings passed are in US-ASCII format and that the strings from the broker are also encoded with no char values greater than 255 and throws an exception if one is encountered.
The CMS interface needs to be extended to allow for more flexible string handling and offer a mechanism to deal with string encodings other than ASCII.
Another alternative is to change the assumption about strings in the CMS API to assume that all string are given as either ASCII strings with chars < 127 and no embedded nulls or are already encoded by the user as Modified UTF-8 by the user so that a Java or .NET client can read all strings sent in CMS Messages as well.
I think the resolution to this issue will be to remove all string encoding from the C++ client and enforce the rule that if the client app wishes to send strings with values larger than 127 than they need to first UTF-8 encode the strings themselves.