Bug 36898 - regular expression extractor encode again a String encoded in UTF-8
regular expression extractor encode again a String encoded in UTF-8
Status: RESOLVED FIXED
Product: JMeter
Classification: Unclassified
Component: Main
2.1
PC Windows XP
: P2 normal (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2005-10-03 16:50 UTC by Darius Hachimarave
Modified: 2005-11-12 18:10 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darius Hachimarave 2005-10-03 16:50:17 UTC
In my threadgroup, I use SOAP/XML-RPC to request my server.
I use regular expression extractor for generating the next request with the
server response.

This work fine if there is no special symbol but when there is an UTF-8 special
symbol in the server response, the extraction doesn't give the String I'm
waiting for.

For example, if the server response contains a String like "<City>Vézelay</City>"
With the regular expression extractor I would like to extract the String
"Vézelay" but it give me the String "Vézelay"
In hexadecimal, the special symbol is C3 00 A9 00 and it become C3 B3 C2 A9
after extraction

It seems that the symbol is encoding another time in utf-8 or in another format.

Note : The bug 27032 (witch is in resolved state) has some similarities.
Comment 1 Sebb 2005-10-03 17:19:54 UTC
Does the response get saved correctly if you configure a listener to "Save 
Response Data"?

Or does this also cause the data to be mangled?
Comment 2 peter lin 2005-10-03 17:25:52 UTC
Looking at webserviceSampler, it currently gets a BufferedReader from apache
soap, so it's unlikely the sampler is the problem.

SOAPTransport st = msg.getSOAPTransport();
RESULT.setDataType(SampleResult.TEXT);
BufferedReader br = null;
// check to see if SOAPTransport is not nul and receive is
// also not null. hopefully this will improve the error
// reporting. 5/13/05 peter lin
if (st != null && st.receive() != null) {
	br = st.receive();
	if (this.getPropertyAsBoolean(READ_RESPONSE)) {
	        StringBuffer buf = new StringBuffer();
		String line;
		while ((line = br.readLine()) != null) {
			buf.append(line);
		}
		RESULT.sampleEnd();
		// set the response
		RESULT.setResponseData(buf.toString().getBytes());

If apache soap doesn't create a reader using the correct encoding, it "could"
cause the problem you see. I don't know apache soap well enough to say with any
certainty that is the case. It could also be a limitation of the assertion.

peter
Comment 3 Darius Hachimarave 2005-10-03 18:09:01 UTC
(In reply to comment #1)
> Does the response get saved correctly if you configure a listener to "Save 
> Response Data"?
> 
> Or does this also cause the data to be mangled?

Yes if I save the response data in a file the datas are not transformed. They
are saved correctly.
Comment 4 Darius Hachimarave 2005-10-05 11:09:10 UTC
I don't think that soap is 
(In reply to comment #2)
> If apache soap doesn't create a reader using the correct encoding, it "could"
> cause the problem you see. I don't know apache soap well enough to say with any
> certainty that is the case. It could also be a limitation of the assertion.
> 
> peter


I don't think that soap doesn't use the correct with its reader encoding.
Because in this case, the server response that I read in JMeter (when I write
the response data in a file) should be bad encoding (just like the String I read
 using regular expression regulator)

Maybe there is a way to spécify the String encoding when i use a regular
expression extractor.
Comment 5 peter lin 2005-10-05 15:37:53 UTC
it's possible you're right and the assertion isn't handling the encoding
correctly. perhaps sebb or mike will know better. jmeter uses oro-matcher, so it
could be we need to set the encoding?  if  I have time tonight I'll take a look
at oro matcher api.

peter
Comment 6 Darius Hachimarave 2005-10-14 15:48:10 UTC
Ok, I think I've finally found where is the bug.
It's in the class RegexExtractor of the package org.apache.jmeter.extractor
in the process() method there is a creation of a PatternMatcherInput whith the
last response data of the last result.
At this place, the response data is passed without any string encoding.

To correct this bug, i've replace the line 
input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData()));

by 
try {
			input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData(),context.getPreviousResult().getDataEncoding()));
		} catch (UnsupportedEncodingException e2) {
			input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData()));
		}

I don't know if it's THE good way to correct it, but now, the String I extract
are encoded correctly.

I'm not really accustomed with the way to modify the source code one a jakarta
project. Can someone do this? Am I habilited to do this ?
Comment 7 Sebb 2005-11-13 03:10:01 UTC
Fixed in 2.1 branch code. Will be in 2.1.2