[AMQCPP-235] UTF8 length marshalling bug in openwire readString and writeString. - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.6, 3.0
Component/s: Openwire
Labels:
None
Environment:

Windows XP / Visual Studio 2005

Patch Info:

Patch Available

Description

In investigating a bug for the check "if( str->size() > 65536 )" which should be "if( str->size() > 65535 )" in writeString() , I found a couple of other problems:
(1) The OpenwireStringSupport::readString method should read the utf8 length as an unsigned short rather than short. The problem is that utf8 encoded strings (using writeString) longer than 32768 will become truncated when read back using readString().
(2) The writeString() method should also check the value of utflen after determining the UTF8 length of the encoded string, since with the support of characters greater than value 127, encodings of 2 UTF8 bytes per byte can exist.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OpenwireStringSupport.cpp.patch
08/Apr/09 16:53
2 kB
Martin Schlapfer
OpenwireStringSupportTest.cpp.patch
08/Apr/09 16:53
2 kB
Martin Schlapfer
OpenwireStringSupportTest.h.patch
08/Apr/09 16:53
0.7 kB
Martin Schlapfer

Activity

Ascending order - Click to sort in descending order

Martin Schlapfer added a comment - 08/Apr/09 16:53

Patch files for OpenwireStringSupport.cpp and associated test case OpenwireStringSupportTest.cpp and OpenwireStringSupportTest.h.

Martin Schlapfer added a comment - 08/Apr/09 16:53 Patch files for OpenwireStringSupport.cpp and associated test case OpenwireStringSupportTest.cpp and OpenwireStringSupportTest.h.

Timothy A. Bish added a comment - 08/Apr/09 21:51

Having looked through some more of the code I still think there is a larger issue here that will need to be addressed. I'm going to play around with some other changes and test cases.

Have you tried sending a TextMessage with a string that has ASCII values greater than 127, I'm betting that will fail to unmarshal correctly as well. This code in the String support class is currently only being used for properties.

Timothy A. Bish added a comment - 08/Apr/09 21:51 Having looked through some more of the code I still think there is a larger issue here that will need to be addressed. I'm going to play around with some other changes and test cases. Have you tried sending a TextMessage with a string that has ASCII values greater than 127, I'm betting that will fail to unmarshal correctly as well. This code in the String support class is currently only being used for properties.

Martin Schlapfer added a comment - 08/Apr/09 23:51

With respect to a TextMessage, it looks like longer strings (>65535) are supported. The length is encoded with 4 bytes rather than 2 bytes.

For character values greater than 127 contained in a text message:
(1) C++ to C++ Client: the (un)marshalling works (except in the case of a null character), because there is no UTF8 encoding/decoding performed.
(2) C++ to Java Client: the Java client throws a java.io.UTFDataFormatException when getText is called on the text message (at this point it is "unmarshalled" using UTF8 decoding, see java.org.apache.activemq.util.MarshallingSupport.readUTF8).

So it looks like an implementation for UTF8 encoding/decoding for the TextMessage is necessary to be interoperable with the Java client with character values larger than 127.

Martin Schlapfer added a comment - 08/Apr/09 23:51 With respect to a TextMessage, it looks like longer strings (>65535) are supported. The length is encoded with 4 bytes rather than 2 bytes. For character values greater than 127 contained in a text message: (1) C++ to C++ Client: the (un)marshalling works (except in the case of a null character), because there is no UTF8 encoding/decoding performed. (2) C++ to Java Client: the Java client throws a java.io.UTFDataFormatException when getText is called on the text message (at this point it is "unmarshalled" using UTF8 decoding, see java.org.apache.activemq.util.MarshallingSupport.readUTF8). So it looks like an implementation for UTF8 encoding/decoding for the TextMessage is necessary to be interoperable with the Java client with character values larger than 127.

Timothy A. Bish added a comment - 09/Apr/09 00:04

Yes, that's pretty much what I thought would happen. I'm reworking some of the code now, I should have this fixed in a few days. I will of course make sure to pick up the fixes from your patches as well.

BTW - Thanks for adding more unit testing, I really appreciate it when folks take the time to add them, keeps regressions from popping up.

Timothy A. Bish added a comment - 09/Apr/09 00:04 Yes, that's pretty much what I thought would happen. I'm reworking some of the code now, I should have this fixed in a few days. I will of course make sure to pick up the fixes from your patches as well. BTW - Thanks for adding more unit testing, I really appreciate it when folks take the time to add them, keeps regressions from popping up.

Timothy A. Bish added a comment - 14/Apr/09 17:53

Fixed in trunk and in the 2.x branch.

The DataInput/OutputStream's have been modified to correctly write and read modified UTF-8 using the normal Java standard which encodes the length as an unsigned short value limiting the length to 65535 encoded bytes. The OpenWireStringSupport class has been modified to correctly read and write modified UTF-8 that encode the length as a signed integer which removes limitations previously set on the length of properties in the Messages to 65535 encoded bytes. The ActiveMQTextMessage has been modified to also use the OpenWireString support Method instead of its own version of the code so now all text should properly encode ASCII values from 0-255.

Enhanced unit tests have also been added.

You are welcome to review the code and comment.

Timothy A. Bish added a comment - 14/Apr/09 17:53 Fixed in trunk and in the 2.x branch. The DataInput/OutputStream's have been modified to correctly write and read modified UTF-8 using the normal Java standard which encodes the length as an unsigned short value limiting the length to 65535 encoded bytes. The OpenWireStringSupport class has been modified to correctly read and write modified UTF-8 that encode the length as a signed integer which removes limitations previously set on the length of properties in the Messages to 65535 encoded bytes. The ActiveMQTextMessage has been modified to also use the OpenWireString support Method instead of its own version of the code so now all text should properly encode ASCII values from 0-255. Enhanced unit tests have also been added. You are welcome to review the code and comment.

Martin Schlapfer added a comment - 15/Apr/09 02:00

Hi Tim,
I'm out this week on vacation (checking email periodically). Next week, I'll take a close look at the changes and provide any feedback. Thanks!

Martin.

Martin Schlapfer added a comment - 15/Apr/09 02:00 Hi Tim, I'm out this week on vacation (checking email periodically). Next week, I'll take a close look at the changes and provide any feedback. Thanks! Martin.

Martin Schlapfer added a comment - 20/Apr/09 22:43

Tim,

Great job. I like that you refactored the string utf8 marshalling into the Data[Input|Output]Stream class to align with Java and used the OpenwireStringSupport class to handle the large string utf8 marshalling. You also used modified utf8 encoding, very nice.

Ideally the utf8 marshalling code could be contained in one spot regardless of integer (large string) or unsigned short length. This would avoid code duplication in the two locations and thus less chance for a bug to be introduced at some later point. One could do this, for example, by writing or reading the integer or unsigned short length accordingly, and then using a common piece of code to write / read the utf8 encoding. That's my only comment.

Looks great!

Cheers,
Martin.

Martin Schlapfer added a comment - 20/Apr/09 22:43 Tim, Great job. I like that you refactored the string utf8 marshalling into the Data [Input|Output] Stream class to align with Java and used the OpenwireStringSupport class to handle the large string utf8 marshalling. You also used modified utf8 encoding, very nice. Ideally the utf8 marshalling code could be contained in one spot regardless of integer (large string) or unsigned short length. This would avoid code duplication in the two locations and thus less chance for a bug to be introduced at some later point. One could do this, for example, by writing or reading the integer or unsigned short length accordingly, and then using a common piece of code to write / read the utf8 encoding. That's my only comment. Looks great! Cheers, Martin.

Timothy A. Bish added a comment - 20/Apr/09 23:30

Thanks for the input. Combining the code would have been ideal but I'm planning on splitting the decaf library out at some point into its own lib for use on some other projects so the AMQCPP client can't depend on that code so I kept them separate.

Timothy A. Bish added a comment - 20/Apr/09 23:30 Thanks for the input. Combining the code would have been ideal but I'm planning on splitting the decaf library out at some point into its own lib for use on some other projects so the AMQCPP client can't depend on that code so I kept them separate.

People

Assignee:: Timothy A. Bish

Reporter:: Martin Schlapfer

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Apr/09 16:46

Updated:: 20/Apr/09 23:30

Resolved:: 14/Apr/09 17:53

ActiveMQ C++ Client

Details

Description

Attachments

Attachments

Activity

People

Dates