|
Davanum,
I haven't been able to follow all changes post 1.1 release - did you already implement a new encoding mechanism? Both version (1.82 & 1.83) will still not work while leaving the java world (e.g. nusoap), see #15133. To me it looks like the current xmlEncodeString method is way too simplified. See Steve' comment at #15133 too. Jens Jens,
Did you try the latest cvs??? Thanks, dims Can you please send in a test case against say the nusoap interop service
(http://marc.theaimsgroup.com/?l=axis-dev&m=105543299708747&w=2, http://dietrich.ganx4.com/nusoap/) Thanks, dims Here's a test against
http://dietrich.ganx4.com/nusoap/testbed/round2_base.wsdl....Works Fine. Closing this bug again. -- dims ============================================================================= import java.io.*; public class Main { public static void main(String[] args) throws Exception { org.soapinterop.InteropLabLocator locator = new org.soapinterop.InteropLabLocator(); String s1 = new String("\u00dc\u00cb\u00cf\u00d6O\u00e4\u00eb\u00ef\u00f6\u00fc\u00ff"); org.soapinterop.InteropTestPortType port = locator.getinteropTestPort(); PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(System.out, "CP850")),true); out.println(s1); String s = port.echoString(s1); out.println(s); } } ============================================================================= Reopening bug. Maybe we should have a flag to aggressively encode character
entities.(see Jens' note at http://marc.theaimsgroup.com/?l=axis-dev&m=105689994130473&w=2) Shouldn't this be raised with the nusoap folks as well? Since axis's client works fine? -- dims I have taken a look into the Xerces XMLSerializer class. printText() is pretty much what we are
looking for. Would it be ok if with migrate the serializer code and depended classes AND add support for other encodings at the same time? Oh, and btw.
I don't think this is a nusoap issue. If we declare something being UTF-8 we should encode strings as UTF-8, wether using the double/three/four byte or &#... representation. Jens,
Please go ahead and send in a patch. See patch guidelines at (http://nagoya.apache.org/wiki/apachewiki.cgi?AxisProjectPages/SubmitPatches). Thanks, dims Please don't forget to send in test case(s) as well.
-- dims Jens -this Xerces serializer coe you mention: would it make axis dependent on
Xerces? Or do we have to cut and paste the relevant portions, thus creating maintenance woes further down the line? -steve I was about to choose the most intuitive design pattern: Copy & Paste.
This will include a signature change of XMLUtils, I guess. Steve:
I forgot to ask: Why should encoding create maintenance problems? Encoding is pretty much a fixed topic, as far I know. Usually. but anywhere you cut and paste code they diverge and your costs/effort
increases. For example, we will need to throw exceptions for \000 and other illegal chars, that may change the code & we are off on our own little branch. Nb, what assumptions are we making about encoding. Does axis assume UTF-8 everywhere it creates/reads XML? Or should the encoding methods be told what encoding to expect and do the right thing for the locale? Steve:
As always (in the last weeks) I agree on your concers. And indeed, illegal/unexpected error handling is one problem area. But I would always consider using well established / tested sources instead of reinventing the wheel. I have moved a bunch of xerces classes to the axis tree and this implementation would allow us to support a lot more than UTF-8/ ISO-8859-1. However this implementation is pretty expensive if we use it in the same way as we do it right now (by calling a static member for every string). In case the xerces based encoding would be of any interest we should use one XMLStringEncoder instance per request which uses the incoming request encoding or a pre configured encoding in case of axis clients. I have started to migrate (copy and paste) a few Xerces classes to ensure proper umlaut encoding
within Axis (EncodingInfo, EncodingMap, Encodings, Printer). Apart from migrating those classes I tried to achieve the following goals: 1. Support other encoding styles than UTF-8/ISO-8859-1. 2. Make encoding configurable through server-config.wsdd settings, use UTF-8 as default. 3. Provide static, request dependant (ThreadLocal) access to the current encoding name/ String encoder within Axis. 4. Support two encoding strategies: a) Fixed encoding. b) Client call dependant response encoding. Apart from some questions "implementing" 1 to 4a is pretty straight forward. A few questions: Do you think 4b could be useful at all? Looks like a major change for axis. The Xerces Encoder makes use of an internal writer which inherits usage of an OutputStream. This is pretty expensive. Also the current implementation is stateful and not thread safe. Therefore I was about to provide a dedicated StringEncoder for every request using ThreadLocals, similar to AxisEngine. getCurrentMessageContext(). Do you think this is OK? If yes, is a static method in AxisEngine a proper location to access the current StringXmlEncoder? The Xerces encoder uses sun.io.CharToByteConverter!? Currently I do have to deal with UnsupportedEncodingExceptions and low level IOExceptions within the XML Encoder. In case of an UnsupportedEncodingException I use UTF-8 as fallback (and complain about the wrong encoding name). This would happen once during initialization. However it is possible to run into IOExceptions for every String->XML encode. During SOAP Response Envelope Encoding I could throw a SOAP Fault. What should I do while encoding SOAP Request or AxisFault elements? The WS-I Basic Profile requirement R1012 states:
A MESSAGE MUST be serialized as either UTF-8 or UTF-16. I remember reading somewhere that the axis team wasn't sure about supporting WS-I Basic Profile.
What is the current state there? Support for WS-I is a requirement for JAX-RPC 1.1 compliance. As such, we
definitely do plan on supporting all of the requirements listed in the WS-I Basic Profile. The work you are doing looks very positive. Hopefully in a six months we will not need to support anything but UTF-8 and UTF-16 since it will be mandated. This in turn will help with interop. OK. Got it.
Should we still be able to support non compliant encodings afterwards? If not I could certainly clean out the xerces classes a lot. I will leave this decision to others more qualified to answer.
Compliance with WS-I does not preclude Axis from supporting other encodings. It only states we must use UTF-8 or UTF-16. I think we might have a standards compliance mode or something along those lines. We have yet to discuss it. I think we should not preclude other encodings from being used. 32 bit unicode
is a reality, and in some places (China) government regulations stipulate that software must support certain non-unicode encodings. I've just bounced a q. off to SOAPBuilders to see what they think.
To date, Axis does what? UTF-8 only? So moving to UTF-8 and UTF-16 only is an improvement, and brings us in line with WS-I But if we add support for arbitrary encodings, then we complicate SOAP for everyone. Someone could write clients that post requests in, say, Sanskrit, have it all work on an Axis impl, and then complain when the service implementors moved to gSOAP. For the sake of interop, therefore, keeping the #of encodings we support constant and matching those everyone else does, would seem to be a good thing. -Steve (typing on a UK keyboard) Agreed. We should stick with the WS-I guidelines.
WS-I testing tools are at: http://www.ws-i.org/implementation.aspx
Note: They use Axis too :) -- dims OK.
I will provide a patch which supports UTF-8 and UTF-16 only, with the ability to extend supported encodings in the future. An active Axis instance will always use one fixed encoding, UTF-8 will be default, UTF-16 may be enabled through configuration. This may take a few days (not that this is really complicated ;) Since we will support UTF-8 and UTF-16 only (for now) the Xerces based implementation was way
too heavy. Therefore I have searched for an alternative and found http://czyborra.com/utf/. I have implemented the two encoders based on the presented algorithms. See attachment for a proof of concept. Steve: You said in #15133 we need to handle chars < 32. Do you have any further details for me? How should we treat ASCII0. Throw a runtime exception? Created an attachment (id=7320)
UTF-8/UTF-16 encoder - proof of concept Sorry, attachment is a simple .tar.
Jens, see bug ID 15494 regarding handling of zeroes.
Essentially any char < 32 other than tab, cr and lf is illegal, and we should throw a runtime exception to state that fact. Created an attachment (id=7348)
New UTF8/UTF16 XMLEncoder - fixes #15133, #15494, #19327 (tar.gz) Created an attachment (id=7349)
Possible patch to use new XML Encoder Added a new encoder. TestCase included.
Tested it using nusoap and local axis client (Mac OS X). Maybe we should ask sascha (see #15133) if those changes still work for him.
Serge Knystautas made changes - 24/Feb/04 04:12 PM
Thiago Jung Bauermann made changes - 16/Dec/05 01:41 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thanks,
dims