|
My fault, by document I was refering to file (physical file onthe hard drive) ie : c:\work\DocumentDeTèst.txt <-- This filename has an accent. I am using the latest version : 2.0 Rc2 As to getAsciiBytes method, as its name implies it is supposed to return ASCII characters only. So, the behaviour of the method is correct. Precisly, but because of that the accent based charaters are converted to ? ie : c:\work\DocumentDeTèst.txt --> c:\work\DocumentDeT?st.txt Eric,
Are you using MultipartPostMethod by any chance? Please give me a bit more details about what your application is supposed to do and what you are trying to accomplish, so I would not have to play a private detective. Oleg Hi Oleg.
Yes, I'am using a multipart post. In our application we want to upload files to a file server from a java application via HTTP. We use multipart because we have to include extra information for the server application to be able to handle the data (ie : link the file to a database object etc ...). We also want to be able to upload multiple files (wichi works well as long as we have no accent in the filenames) -- Here is the code that buids the file parts HttpClient client = new HttpClient(); MultipartPostMethod httpsPost = new MultipartPostMethod ( m_docServer ); //Set header information httpsPost.setRequestHeader("Content-Type", "multipart/form-data; boundary="+BOUNDS); //Adding the main parts. StringPart partToAdd = new StringPart("ClassUID", classUID); partToAdd.setTransferEncoding(null); partToAdd.setContentType(null); httpsPost.addPart( partToAdd ); partToAdd = new StringPart("MethodName", methodName); partToAdd.setTransferEncoding(null); partToAdd.setContentType(null); httpsPost.addPart( partToAdd ); partToAdd = new StringPart("Params", params); partToAdd.setTransferEncoding(null); partToAdd.setContentType(null); httpsPost.addPart( partToAdd ); //Adding teh files parts. int i=0; Iterator iterator = parts.keySet().iterator(); AI_DOCPART part; String partID; String partFile; FilePart fPart; //loop until we have created all file parts. while(iterator.hasNext()){ part = (AI_DOCPART)(iterator.next()); partID = part.getIDAsString(); partFile = (String) parts.get(part); try { fPart = new FilePart("FILE"+(i+1), new File(partFile)); //partToAdd.setContentType(null); //partToAdd.setTransferEncoding( null ); httpsPost.addPart(fPart); } catch (FileNotFoundException e) { throw new AIException("ERR_INVALIDE_FILENAME","",GUIMediator.getStringResource ("Corporate","ERR_INVALIDE_FILENAME"),""); } partToAdd = new StringPart("PARTNUMBER"+(i+1) , partID); partToAdd.setContentType(null); partToAdd.setTransferEncoding( null ); httpsPost.addPart( partToAdd ); i++; } //Set timeout in Milliseconds -> 30 secondes client.setConnectionTimeout( 30000 ); //Send the data int status=0; try { status = client.executeMethod(httpsPost); } ... Form-based File Upload in HTML specification (RFC 1867)
<http://www.ietf.org/rfc/rfc1867.txt> that HttpClient implements follows the rules of all multipart MIME data streams as outlined in RFC 1521 and RFC 1522. MIME specification requires all non-ASCII content to be represented using ASCII charset only. Currently HttpClient does not perform such translation automatically. You will have to take care of filename encoding prior to passing it to the FilePart as a parameter. I was going to contribute quote-printable encoder/decoder to the Commons Codec library but never got a chance. To sum things up: if the relevant RFCs are to be strictly adhered to, the behaviour on the part of HttpClient is correct. However, I do agree that it would be nice if HttpClient took care of non-ASCII charset translation automatically. So, feel free to reopen this bug as a feature request. Oleg Re-opened as a feature request
***
Henri Yandell made changes - 12/May/06 02:14 PM
I just wanted to add some interesting RFC for this feature request:
RFC 2231 RFC 2047 RFC 2184
Oleg Kalnichevski made changes - 12/Dec/06 07:49 PM
I have created a patch against revision 532277 for this problem. Although it is not according to the RFC it does do the job. For instance IE is doing the same for multi-part mime upload. Not that I am suggesting that IE is doing the right thing, but it does mean that probably many servers can deal with post.
Index: src/java/org/apache/commons/httpclient/methods/multipart/FilePart.java =================================================================== --- src/java/org/apache/commons/httpclient/methods/multipart/FilePart.java (revision 532277) +++ src/java/org/apache/commons/httpclient/methods/multipart/FilePart.java (working copy) @@ -193,7 +193,11 @@ if (filename != null) { out.write(FILE_NAME_BYTES); out.write(QUOTE_BYTES); - out.write(EncodingUtil.getAsciiBytes(filename)); + //still not the rigth thing according to RFC1522 + out.write( EncodingUtil.getBytes( filename, this.getCharSet() ) ); + /*TODO: the right thing would be to do this, but some MIMEDecoders can't handle it. + String s = MimeUtility.encodeText(filename); + */ out.write(QUOTE_BYTES); } } Dolf,
What is MimeUtility and what package does it come from? Oleg Hi Oleg,
Thanks for looking into this and sorry for not making clear where MimeUtility originates from. MimeUtility is from javax.mail (for instance http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/mail/internet/MimeUtility.html). Dolf Dolf,
We simply cannot not introduce a new dependency for HttpClient 3.x code line. This will have to wait until 4.0 Oleg Hi Oleg,
Maybe the report is not clear. According to the mult-part mime spec the correct behaviour would be to use a construct is via the MimeUtility. The problem with that is that the mime-parsers that I have tested with don't handle this correctly. When just encoding the filename with the charset of the request, it works but it is not according to the spec. The patch I handed in, works on most mime parsers (as IE is doing this too) but is not according to the spec. I understand that you don't want to introduce a new dependancy, but maybe you don't need to as the patch works without the MimeUtility. The line containing the MimeUtility is commented. Dolf Dolf,
My bad. I overlooked that fact that the reference to MimeUtility was inside a comment block. Sebastian, I believe we can depend on JavaMail, as long as we do not have it in the repository and do not ship it with the release packages. Since we do not bundle dependencies with HttpClient anyways, this should not be a problem for us. Having said all that, I think Commons Codec HttpClient is already dependent upon provides all the necessary codecs (BASE64 and quote-printable). It is just a matter of someone taking up this job. Folks, Any objections to relaxing the compliance with the spec and applying the patch submitted by Dolf? Oleg One of HtmlUnit users came across this bug while trying to upload a file with non-ASCII name.
By sniffing the traffic generated by IE7, "filename" is encoded with page charset as Dolf has kindly suggested. However, IE7 does not send the charset after the 'Content-Type': --------------------------- Content-Disposition: form-data; name="field_name"; filename="C:\non_ascii.txt" Content-Type: text/plain --------------------------- So, to exactly mimic this behaviour, appreciate if part charset is separated from the "Content-Disposition" charset. Many thanks. We should be spec compliant and not "compatible with most" implementations - I don't care how wrong IE7 implements this. RFC 2183, Section 2.3 clearly states the limitation to ASCII. People should just accept this limitation instead of trying to bend the standard to their needs. Standards are made to ensure interoperability, for $DIETY's sake. If you need to pass a non-ASCII filename, this is simply not the place for it. You could add another text/plain MIME part with a well-defined charset and pass the file name there for instance.
Ortwin, I think the RFC is worded strangely. It is certainly true that Section 2.3 says US-ASCII only, but it seems like that section is outdated.
In Section 2, there is a very large note that reads: NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters) parameter value containing only non-`tspecials' characters SHOULD be represented as a single `token'. A short parameter value containing only ASCII characters, but including `tspecials' characters, SHOULD be represented as `quoted-string'. Parameter values longer than 78 characters, or which contain non-ASCII characters, MUST be encoded as specified in [RFC 2184]. Looking at the types of parameters, 4 of them are dates and one is an integer. The only one that's a string is the filename, so the note above must refer to it. RFC 2184 describes how to encode the non-ASCII case. Interestingly, it looks IE does not follow RFC 2184. Section 2.3 refers to RFC 2045, which is older than RFC 2184. Overall, I'd say the RFC is unclear on this issue. Thanks Moh Interesting, although I have never seen it being used in the wild. By the way, RFC 2184 is obsoleted by RFC 2231.
Roland Weber made changes - 07/Feb/08 08:34 PM
Oleg Kalnichevski made changes - 16/Apr/08 05:49 PM
Oleg Kalnichevski made changes - 21/May/08 05:18 PM
MultipartEntity now encodes non-ASCII characters in the disposition-content header using content charset when used in the browser compatibility mode and replaces non-ASCII characters with ? when used in the strict mode. One always has an option to encode the file name using one of the standard encoding mechanisms as described in RFC2231 and RFC2047.
Closing this issue as resolved. Oleg
Oleg Kalnichevski made changes - 02/Dec/08 12:50 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My apologies, but I do not quite understand the nature of the problem. What do
you mean by 'cannot create a document'? What do you mean by a document in the
first place? Request content body? Response content body?
what version of HttpClient are you using and what is it you are trying to get done?
As to getAsciiBytes method, as its name implies it is supposed to return ASCII
characters only. So, the behaviour of the method is correct.
You might want to have a look at the HttpClient character encoding guide for
more details:
http://jakarta.apache.org/commons/httpclient/charencodings.html
I'll have no choice but to mark the report as invalid unless more information is
given
Oleg