Details
Description
When sending a message from:
JMS producer to STOMP consumer
or
STOMP producer to JMS consumer
which contains a 4-byte unicode code points e.g. https://unicode-table.com/en/1F5A4/ there is a corruption of the message.
In the JMS to STOMP case the code point gets converted to:
ef bf bd ef bf bd when it should be f0 9f 96 a4.
and in the STOMP to JMS case the JMS client throws an exception:
Exception in thread "main" javax.jms.JMSException: java.io.UTFDataFormatException
at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
at org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:104)
at org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:84)
at testkonsument.App.JMS(App.java:86)
at testkonsument.App.main(App.java:42)
Caused by: java.io.UTFDataFormatException
at org.apache.activemq.util.MarshallingSupport.convertUTF8WithBuf(MarshallingSupport.java:389)
at org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:358)
at org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:101)
... 3 more
Using 4-byte unicode points
from STOMP to STOMP
or
from JMS to JMS
is not a problem, both works and does not corrupt the code point.
Note that 2- (e.g. https://unicode-table.com/en/00F6/) or 3-byte (e.g. https://unicode-table.com/en/2614/) Unicode code points does NOT get corrupted, even if the same message includes a 4-byte Unicode code point.
Attachments
Issue Links
- is related to
-
OPENWIRE-76 Standardize UTF-8 multi-byte character handling
- Open