[SHINDIG-1981] Wrong encoding of non-file form items in RPC requests with multipart/form-data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.5.1
Fix Version/s: None
Component/s: Java
Labels:
None

Description

We're using RPC requests with multipart/form-data encoding when uploading files. All encoding settings on both frontend and backend are configured to UTF-8, to handle non-ASCII content.

However, even then the content inside the 'request' object was still encoding-wise garbage.

Debugging that showed that when the JsonRpcServlet is parsing the request body it assumes that the encoding is either ISO-8859-1 for non-file items, or is defined in the Content-Type header on that item.
In HTML 5 this is both no longer a correct assumption as per http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data

If the algorithm was invoked with an explicit character encoding, let the selected character encoding be that encoding. (This algorithm is used by other specifications, which provide an explicit character encoding to avoid the dependency on the form element described in the next paragraph.)

Otherwise, if the form element has an accept-charset attribute, then, taking into account the characters found in the form data set's names and values, and the character encodings supported by the user agent, select a character encoding from the list given in the form's accept-charset attribute that is an ASCII-compatible character encoding. If none of the encodings are supported, or if none are listed, then let the selected character encoding be UTF-8.

Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.

Otherwise, let the selected character encoding be UTF-8.

and

The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).

The patch in the review https://reviews.apache.org/r/24449/ fixes the problem for us, by using the request encoding as a default when the content-type header does not specify any other encoding.

I've tested this with firefox on linux, and am currently checking that it still works as expected with IE and chrome.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andreas Kohn

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Aug/14 13:48

Updated:: 07/Aug/14 13:48