Bug 41705 - Make HTTP Sampler POST using specified encoding
Make HTTP Sampler POST using specified encoding
Status: RESOLVED FIXED
Product: JMeter
Classification: Unclassified
Component: HTTP
2.2
All All
: P2 enhancement (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2007-02-26 06:09 UTC by Alf Hogemark
Modified: 2007-04-15 12:03 UTC (History)
0 users



Attachments
Patch for allowing http post with specified encoding (8.36 KB, patch)
2007-02-26 06:20 UTC, Alf Hogemark
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alf Hogemark 2007-02-26 06:09:52 UTC
Currently, both variants of the HTTP Request (the normal and the one using HTTP
Client) is using default encoding when submitt a POST request.

It ought to be possible to specify what encoding should be used when posting the
data to the web server.

The Http Request (class name HttpSampler / PostWriter) is currently using the
default platform character encoding when doing a POST request.
The Http Request HTTPClient (class name HttpSampler2) is currently using
ISO-8859-1 as the character encoding when doing a POST request.

It would be nice if it was possible to specify the encoding to use for the POST
in the HTTP Request parameters section.

As the situation is today, if you have som values which are encoded as UTF-8,
they are not properly transmitted to the web server.
Comment 1 Alf Hogemark 2007-02-26 06:15:25 UTC
I think this bug is related, but I'm not exactly sure, to the following existing
bugs : 41305, 33435.

It also seems to be somewhat related to 38287, 25753, 30823, both they seem to
talk about http get requests, and not http post requests, like I do.
Comment 2 Alf Hogemark 2007-02-26 06:20:48 UTC
Created attachment 19638 [details]
Patch for allowing http post with specified encoding

Attached is a patch for allow HTTP POST requests to be sent with user specified
encoding.

The patch is against the
svn.apache.org/repos/asf/jakarta/jmeter/branches/rel-2-2 as of today.

Comment / suggestions to the patch are welcomed.
Comment 3 Alf Hogemark 2007-02-26 06:37:30 UTC
Note that the patch above does not change the POSTing of multiform/files,
handled in the same "sendPostData" method in the "HTTPSampler2" class.
Currently, UTF-8 is hardcoded for multiform/files in that method, it would be
trivial to change that code to use the user specified encoding, as is now done
for normal POST request.
I haven't changed the posting of files in the "HTTPSampler" class neither.

I think that the org.apache.jmeter.protocol.http.sampler.HTTPSampleResult, which
I think is the class which writes the output that is shown in the "View results
tree"->Request pane, incorrectly displays what is sent to the web server.
The reason why I think so, is that it displays the same data before and after I
apply the patch above. The "HTTPSampleResult" seems to just be using Strings,
and not actually displaying the content as written to the output stream to the
web server, so that explains, I think, why it displays different from what is
being sent to the web server.
I was thinking about patching the "HTTPSampleResult", so that it would print the
name of the encoding used, but I will wait for feedback on the patch above
before doing more work.
Comment 4 Alf Hogemark 2007-02-27 06:23:29 UTC
The code for PostWriter.java in the patch, should probably be changed slightly, from
    String postData = sampler.getQueryString();
    OutputStreamWriter out = new
OutputStreamWriter(connection.getOutputStream(), sampler.getContentEncoding());
    out.write(postData);

to
    String postData = sampler.getQueryString();
    OutputStreamWriter out = null;
    if(sampler.getContentEncoding() != null &&
sampler.getContentEncoding().trim().length() > 0) {
         out = new OutputStreamWriter(connection.getOutputStream(),
sampler.getContentEncoding());
     }
     else {
         out = new OutputStreamWriter(connection.getOutputStream());
     }
     out.write(postData);

so that the behavior is backwards compatible, i.e. if no encoding is specified,
the default one is used.
Comment 5 Sebb 2007-03-09 17:08:39 UTC
Applied to SVN. It will be in the next nightly build.

What about the file encoding? Is that still needed?

As regards the HTTPSampleResult/View Results issue - probably best to raise a 
separate Bugzilla entry for that.
Comment 6 Alf Hogemark 2007-03-15 03:20:52 UTC
(In reply to comment #5)
> Applied to SVN. It will be in the next nightly build.

Thanks

> 
> What about the file encoding? Is that still needed?
> 

I'm assuming you are referring to the encoding used when send files as part of
the POST.
The file is either sent as the post body or as a multipart.

For the PostWriter.java, which is used by the HTTPSampler.java, the file content
is in both scenarios sent raw, i.e. the bytes are read directly from the input
file, and sent as raw bytes to the web server.
The parameters sent in the multipart is always encoded as iso-8859-1.
The encoding specified for the POST request is not used at all in that case.
I think that is correct, because I think the parameters should be encoded using
iso-8859-1.

For the HTTPSampler2.java (the one using HttpClient code), the two scenarioes
are handled differently, from what I can see from the code.
If the file is sent as the post body, then the bytes are sent raw.
If the file is sent as a multipart, then currently the code does this :
//TODO should allow charset to be defined ...
parts[i]= new FilePart(getFileField(), input, getMimetype(), "UTF-8" );//$NON-NLS-1$

This looks to me like one assumes that the input text file is encoded using
utf-8, and one includes a "; charset=" in the "Content-Type:" which is sent in
the multipart for the file.
In my opinion, this makes HTTPSampler and HTTPSampler2 send different data to
the web server if the file is sent as multipart.
The HTTPSampler always include a "; charset=UTF-8".

Either one can remove the hardcoded "UTF-8" in the HTTPSampler2, and just use
null for charset when constructing the FilePart. Then FilePart will have default
charset, which is "ISO-8859-1". Then one could explicitly do "setCharSet(null)"
on the FilePart. With charset set to null, the "; charset=" will not be included
when the multiform is sent.
I've been looking at the source for the files in the
org.apache.commons.httpclient.methods.multipart package.

Or one can let the HTTPSampler2 use the specified "Content encoding" parameter
when constructing the FilePart. Then I think one should also change the
PostWriter, so that it includes the "; charset" in the "Content-Type" in the
writeFileToURL method in the PostWriter, if the "Content encoding" parameter is
specified for the sampler.

I prefer to add support for having the "; charset=" set to the "content
encoding" specified by the user. If others prefer that as well, I can supply a
patch.

> As regards the HTTPSampleResult/View Results issue - probably best to raise a 
> separate Bugzilla entry for that.
I might have a closer look at the "View results" issue, and if I do, I will open
a separate entry for it.
Comment 7 Sebb 2007-04-15 12:03:04 UTC
It looks as though this is fixed by the recent commits I made.

If not, please re-open with details of what still needs to be done.