Issue Details (XML | Word | Printable)

Key: HTTPCLIENT-368
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: HttpComponents Dev
Reporter: toraneko
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
HttpComponents HttpClient

[PATCH]character encoding handling is invalid at multipart

Created: 31/Jul/04 06:38 AM   Updated: 22/Apr/07 07:10 AM
Return to search
Component/s: HttpClient
Affects Version/s: Snapshot
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File commons-httpclient-header-body-encoding.path 2004-07-31 06:39 AM toraneko 18 kB
Environment:
Operating System: other
Platform: All

Bugzilla Id: 30420
Resolution Date: 15/May/06 09:45 PM


 Description  « Hide
Hi,

Commons-Httpclient handle character encoding incorrect at multipart. This is
significant problem for other than English people like me. Multipart has two
encoding. First is header encoding which specify header of each part. Second
is it's body encoding. Body encoding works well but header encoding is fixed
as 'asc-ii'. This problem user following situation.

* upload file which file name is described by other than "asc-ii".
* use parameter which include other than "asc-ii" character.

Unfortunately , It seems RFC doesn't define header encoding for multipart but
a lot of people needs set header encoding for thier own laungage. I attached
the patch. Please fix this problem.

regards,

Takashi Okamoto

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
toraneko added a comment - 31/Jul/04 06:39 AM
Created an attachment (id=12286)
patch for handling header encoding

Ortwin Glück added a comment - 02/Aug/04 02:00 PM
Just a simple question:
How would the server know which encoding was used for the multipart headers?

Adrian Sutton added a comment - 02/Aug/04 02:52 PM
According to RFC 2388 (http://www.faqs.org/rfcs/rfc2388.html), section 3, the encoding of header
values is ASCII but non-ASCII characters may be encoded according to RFC2047 (http://www.faqs.org/
rfcs/rfc2047.html) which is MIME encoding. So as far as I can tell this patch is incorrect but we may
want to consider having the capability for HttpClient to use RFC2047 encoding automatically when
required.

The current workaround is to simply specify the header values pre-encoded according to RFC2047.

Does that seem right to anyone else?

Ortwin Glück added a comment - 02/Aug/04 03:08 PM
Adrian,

Thanks for the hint to RFC-2388. However I am unsure if it really applies here,
as RFC-2388 covers "multipart/form-data" only.

Adrian Sutton added a comment - 02/Aug/04 04:45 PM
oops, you're right. RFC 1867 is referenced in the JavaDocs for MultipartPostMethod and it says (in
section 7):

Field names originally in non-ASCII character sets may be encoded using the method outlined in RFC
1522.

RFC1522 is available at http://www.faqs.org/rfcs/rfc1522.html and is also MIME encoding (it's part two
of the document that RFC 2047 is part three of). One of the options is provides is:

4.1. The "B" encoding

   The "B" encoding is identical to the "BASE64" encoding defined by RFC
   1521.

We already have BASE64 encoding available to us. The other option seems to be quoted-printable. I'd
need to read the spec in more detail to be sure about how it all works but it looks like the header values
could be pre-encoded according to RFC 1522 and then passed into HttpClient as I mentioned earlier. It
should wind up looking something like:

=?US-ASCII?Q?Keith_Moore?=

Which is the same format used in email headers.

In fact, our own JavaDocs quote the standard:

Field names originally in non-ASCII character sets may be encoded using the method outlined in RFC
1522.

We probably should look at implementing support for this at some point then.

Oleg Kalnichevski added a comment - 08/Aug/04 08:06 PM
Folks,
I have already submitted Q-codec and B-codec implementations to the
commons-codec project. Both codecs are available as of release 1.3Folks,
I have already submitted Q-codec and B-codec implementations to the
commons-codec project. Both codecs are available as of release 1.3

http://jakarta.apache.org/commons/codec/changes-report.html#1_3

With just a few lines of code non-ascii character encoding can be implemented on
top of the stock version of HttpClient. Full integration of this feature is
targeted for 4.0

Oleg

*** This bug has been marked as a duplicate of 24504 ***