Uploaded image for project: 'Shindig'
  1. Shindig
  2. SHINDIG-1981

Wrong encoding of non-file form items in RPC requests with multipart/form-data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.5.1
    • None
    • Java
    • None

    Description

      We're using RPC requests with multipart/form-data encoding when uploading files. All encoding settings on both frontend and backend are configured to UTF-8, to handle non-ASCII content.

      However, even then the content inside the 'request' object was still encoding-wise garbage.

      Debugging that showed that when the JsonRpcServlet is parsing the request body it assumes that the encoding is either ISO-8859-1 for non-file items, or is defined in the Content-Type header on that item.
      In HTML 5 this is both no longer a correct assumption as per http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data

      If the algorithm was invoked with an explicit character encoding, let the selected character encoding be that encoding. (This algorithm is used by other specifications, which provide an explicit character encoding to avoid the dependency on the form element described in the next paragraph.)

      Otherwise, if the form element has an accept-charset attribute, then, taking into account the characters found in the form data set's names and values, and the character encodings supported by the user agent, select a character encoding from the list given in the form's accept-charset attribute that is an ASCII-compatible character encoding. If none of the encodings are supported, or if none are listed, then let the selected character encoding be UTF-8.

      Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.

      Otherwise, let the selected character encoding be UTF-8.

      and

      The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).

      The patch in the review https://reviews.apache.org/r/24449/ fixes the problem for us, by using the request encoding as a default when the content-type header does not specify any other encoding.

      I've tested this with firefox on linux, and am currently checking that it still works as expected with IE and chrome.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ankon Andreas Kohn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: