Bug 40589 - Escaped ampersand characters are not unescaped when URL's are visited
Escaped ampersand characters are not unescaped when URL's are visited
Status: RESOLVED FIXED
Product: JMeter
Classification: Unclassified
Component: Main
2.2
PC Windows XP
: P3 normal (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2006-09-22 21:24 UTC by Jonathan Morace
Modified: 2007-03-18 14:05 UTC (History)
0 users



Attachments
Patch for using & instead of & in URL (1.04 KB, patch)
2007-02-27 02:27 UTC, Alf Hogemark
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Morace 2006-09-22 21:24:52 UTC
When using the HTTPRequest sampler with the "Retrieve all embedded resources
from HTML files" enabled, JMeter will visit escaped URL's without unescaping
them first.  The XHTML spec requires that the "&" character be converted to
"&".  So when a stylesheet reference like the following is created, it loads
the page without first unescaping the "&" sequence to "&".

<link rel="stylesheet" type="text/css" href="/Styles.css?one=1&amp;two=2" />

In a dynamic application this can lead to errors and bad report metrics.
Comment 1 Alf Hogemark 2007-02-27 02:27:17 UTC
Created attachment 19643 [details]
Patch for using & instead of &amp; in URL

Attached is a patch against
svn.apache.org/repos/asf/jakarta/jmeter/branches/rel-2-2 as of today.

It replaces all &amp; in the URL for an embedded resource with &.
This is needed to be able to test valid xhtml pages, because xhtml will use
&amp; in the href attribute of the a tag. But when the browser or in this case
Jmeter uses that URL, it must use & and not &amp;
Comment 2 Alf Hogemark 2007-03-07 01:39:27 UTC
I think it is important to fix this, since more and more sites are using proper
xhtml. The suggested patch seems unproblematic to me.
Comment 3 Alf Hogemark 2007-03-17 12:50:42 UTC
Do you think the suggested patch is wrong, or are you afraid of any side effects ?

Any input on the patch, or suggestions on how to solve this are welcomed.
Comment 4 Sebb 2007-03-17 13:35:12 UTC
It does not seem like it will have side effects, as it only applies to 
downloadable resources.

However, I'm not sure  that this is the correct place to fix the problem - it 
seems odd to be encoding spaces yet decoding ampersands.

I think it should probably be fixed where the URLs are extracted.
Comment 5 Alf Hogemark 2007-03-18 05:12:34 UTC
If you try the following HTML in your browser :
<html>
<head>
<title>test</title>
</head>
<body>
<p>A test <a href="http://www.google.com/test?somekey=some
value&amp;someotherkey=some%20value%20indeed">link</a></p>
</body>
</html>

and click the link, you will see that the browser then tries to fetch this url :
http://www.google.com/test?somekey=some%20value&someotherkey=some%20value%20indeed
This is the URL you will see in the "Address" field in the browser.

If you want to fix this where the URLs are extracted, do you then mean in the
HTMLParser class ?
I.e. to perhaps add a protected method in HTMLParser  "protected String
getExecutableUrl(String anUrl)", which all of the sub classes of HTMLParser
should call before putting an URL into the URLCollection they are building ?

The "getExecutableUrl", I wish I had a better name, method would then take care
of encoding spaces and decoding "&amp;" HTML entities.
I guess the encoding of spaces is done just to support incorrect HTML written by
a lot of people. The decoding of "&amp;" HTML entities is something which is
correct to do, in my opinion.

Or do you want to change URLCollection or URLString ?

Advices are welcomed

Comment 6 Sebb 2007-03-18 09:50:08 UTC
Since the embedded URL list is actually returned as URLs, rather than strings, 
it seems to me that the decoding from the href (etc) attributes needs to be 
done before the URL is created.

So yes, I think it needs to be done at the parsing stage. This would also 
potentially allow for different decoding depending on the document type (html, 
xhtml).

I'll patch this shortly.
Comment 7 Sebb 2007-03-18 14:05:46 UTC
Fixed in SVN.

Will be in nightly builds after r519694