|
Hi All!
We run in the same problems... :o( I have fixed this issue and promote the patch and the Test (separat). The patch ist tested with special german characters "ä ö ü ß Ä Ö Ü". Regards, Christian Hi All!
We also ran into this one. (Again with german chars) again with a service not supporting character entities properly. Thanks to Christian for the fix. Btw. I was surprised seeing the original source code, I assumed an UTFEncoder would produce UTF8-encoded XML rather than forcing character entities for any character beyond 0x7f.... very funny. Regards, Tom Hi,
Is there any verison of Axis.jar where the patch is already applied. We are facing the similar issue in our application with axix 1.3. Regards Vinod I am a bit puzzled with this bug.
In principle, I agree with Thiago. If the output writer is created with the correct encoding (and it seems it is), there should be no need to "re-encode" characters above 0x7F in UTF-8, or above 0xFFFF in UTF-16. It seems the class org.apache.axis.components.encoding.AbstractXmlEncoder fixes this issue in its "encode" method. The problem is that none of its subclasses uses the same strategy for their writeEncoded() methods. Why is it so? In fact, looking at the code, once the "entities replacement" code is removed from the subclasses, they are all the same! It seems we could live with only a single XMLEncoder implementation for all encodings! Please, can anybody confirm or correct this? This patch modifies the DefaultXMLEncoder and XMLEncoderFactory classes as specified in my last comments.
It seems to work. At least, it passes most functional-tests (those not relying on unavailable remote services). I have also tested it with SoapUI with success. Hope it helps Unit test for the patch in AXIS_2342.diff
We ran into this issue with axis 1.4 in a hybrid java/perl/.net environment trying to communicate a euro sign (unicode 20ac, utf8 e282ac). The axis 1.4 service advertised itself as outputting utf8 but the euro sign got encoded as € which imo looks more like a dirty hack.
What actually helped was removing all the special encoding code from the default case in the writeEncoded method in org.apache.axis.component.encoding.UTF8Encoder. This made axis output a nice utf8 euro sign. It looks like there's some final encoding going on at a higher level in axis, but I didn't bother to look into it further. The relevant section of UTF8Encoder becomes: case '\t': writer.write(TAB); break; default: if (character < 0x20) { throw new IllegalArgumentException(Messages.getMessage( "invalidXmlCharacter00", Integer.toHexString(character), xmlString.substring(0, i))); } else { writer.write(character); } break; } This is a major problem for our project: we need to send Russian text through Axis to Axis/C and .NET.
How can this encoding be called UTF8 if it encodes all symbols after 0x7f in a special way?? It should be called HTML encoding then. The right way of encoding is shown in the previous patch. For now we are searching for some kind of workaround. May be using some generic type with our Serializer/Desrializer classes. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is affecting me because my application must talk to a webservice which doesn't understand XML character entities (I know, it should, but fixing the webservice is not an option). The only way I can send non-ASCII characters is using UTF-8 or ISO-8859-1, which is not possible with Axis.
I tested with Axis 1.2.1 and 1.3. I didn't test with the trunk version, but looking at the code with ViewCVS, the problem is still there (class UTF8Encoder).