I have a simple redirect rule that looks something like this: RewriteCond %{QUERY_STRING} Insurrection=rss RewriteRule ^svn/(.*)$ /rss.cgi/$1 [R,L] Now, the URL given is to a file which happens to have various strange characters in it, thus the URL is very escaped. The URL looks like: href="/svn/example/trunk/tests/CanThisWork&amp%3bInSVN%3f/test?Insurrection=rss" and the end result is that the %3f (which is a "?") and all after it is stripped off. If I have the page link directly to the target of the rewrite within the HTML, it works. Note that things are even worse if I try to use the [P,L] (proxy) rather than just redirect [R,L] as then other escaped characters cause problems. I have not yet put together a full set of test cases but the following does work in [R,L] but not in [P,L] href="/svn/example/trunk/tests/Really%21%7E%23$%25@%5e*%28%29&%20Nasty/test?Insurrection=rss" I will put some of the test cases on the public web site at http://svn.sinz.com:8000/ using exactly these rewrite rules.
Specifically, the example of: http://svn.sinz.com:8000/svn/example/trunk/tests/TestCase-%3f-/test.txt?Insurrection=log does not work while what should have been the rewritten URL does: http://svn.sinz.com:8000/log.cgi/example/trunk/tests/TestCase-%3f-/test.txt?Insurrection=log I have now put some specific test cases on the web site for public access at the Subversion/Insurrection URL of http://svn.sinz.com:8000/svn/example/trunk/tests/
I have now verified this with 2.0.54
This is actually worse that I thought. In the redirect case, all CGI parameters that have URL escaped characters are munged into being double-escaped. The "%3F" escape within the URL path part after the rewrite looks like it is no longer escaped and now becomes the CGI introducer. In the proxy case, the CGI parameters are fine but a number of escape codes in the URL path part now get confused. See the tests at http://svn.sinz.com:8000/rewrite-test/index.html This is against Apache 2.0.54
Created attachment 15000 [details] Test page that lets you try all of the 7-bit ASCII test cases This is the source to the test case that I have on http://svn.sinz.com:8000/rewrite-test/index.html The test code is also available from my Subversion server at http://svn.sinz.com:8000/svn/Web/rewrite-test/
If I remember right URLs that are rewritten will be escaped by defult. Maybe there is still a problem above and beyond this but I didn't see you mention tests using the "NE" option, per the apache 2 docs this is defined as: 'noescape|NE' (no URI escaping of output) This flag keeps mod_rewrite from applying the usual URI escaping rules to the result of a rewrite. Ordinarily, special characters (such as '%', '$', ';', and so on) will be escaped into their hexcode equivalents ('%25', '%24', and '%3B', respectively); this flag prevents this from being done. This allows percent symbols to appear in the output, as in RewriteRule /foo/(.*) /bar?arg=P1\%3d$1 [R,NE] which would turn '/foo/zed' into a safe request for '/bar?arg=P1=zed'.
(In reply to comment #5) > If I remember right URLs that are rewritten will be escaped by defult. Maybe > there is still a problem above and beyond this but I didn't see you mention > tests using the "NE" option, per the apache 2 docs this is defined as: I know what the NE option does, but if you look at the rewrite rules, none of the rules actually use special caracters. However, the URL has special characters in it. The examples at http://svn.sinz.com/rewrite-test/index.html show this as the URL passed into the rewrite engine has the special characters and yet the data within the CGIs shows that something bad has happened. The test page referenced above shows three frames, one each of not-rewritten, R-rewritten, and P-rewritten requests and what the CGI/Environment says is going on. ####################################################################### RewriteRule ^redirect/(.*)$ "/rewrite-test/test.cgi/$1" [R,L] RewriteRule ^proxy/(.*)$ "/rewrite-test/test.cgi/$1" [P,L] ####################################################################### Given that all I do is take part of the URL and change it, this should be the correct way of handling it.
(In reply to comment #6) > RewriteRule ^redirect/(.*)$ "/rewrite-test/test.cgi/$1" [R,L] > > Given that all I do is take part of the URL and change it, this should > be the correct way of handling it. I'm seeing exactly the same thing, for the same purpose, and also NE makes no difference - I have a redirect that accepts a url embedded with the url and passes it to a PHP script which then redirects to the embedded URL. I'd describe the problem differently - it's as if the matched subpattern has urldecode applied to it before it is passed to the output pattern. I'm trying to avoid this being a me-too report, so here is a thoroughly unpleasant workaround: if you double-urlencode the incoming parameter, you end up with the string you were expecting in the output pattern. Here's what I want to be passing in: http://www.example.com/u/http%3A%2F%2Fwww.apache.org%2F This rule handles it: RewriteRule ^u/(.*) redirect.php?url=$1 [R,L] mod_rewrite generates an invalid URL: redirect.php?url=http://www.apache.org/ If I double-urlencode the embedded URL: http://www.example.com/u/http%253A%252F%252Fwww.apache.org%252F I get: redirect.php?url=http%3A%2F%2Fwww.apache.org%2F which works, but undermines much of the point of having nice tidy mod_rewrite URLs. Is this bug really this straightforward? I can't think of a circumstance where you'd want this behaviour and it's so simple that I can't believe that this has not been encountered before. I'm running 2.0.54 on MacOS X and 2.0.52 on RHEL 4 and both are acting this way.
*** Bug 36986 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > (In reply to comment #6) > > RewriteRule ^redirect/(.*)$ "/rewrite-test/test.cgi/$1" [R,L] > > > > Given that all I do is take part of the URL and change it, this should > > be the correct way of handling it. > [...] > I'm trying to avoid this being a me-too report, so here is a thoroughly > unpleasant workaround: if you double-urlencode the incoming parameter, > you end up with the string you were expecting in the output > pattern. Here's what I want to be passing in: [...] It is unacceptable to try to double-URL encode the input as the input is the valid public URL that a web browser may request. So how do I cause the input to be double encoded? And how do I deal with the strange side-effects of that in the CGI scripts? (And what of when the data itself has to be URL encoded but the script is called directly?) Anyway, to me this seems like a significant problem. My workaround is rather nasty (see the Insurrection project) and requires some tricks in both the rewrite rule and the way the CGI parameters are processed (and thus reprocessed after applying the fixup code) A very ugly workaround that does not really cover the general case.
I did say that it was unpleasant. It's obviously not a general workaround for public URLs, but if you are in control of both generating and receiving the URLs (as I am in my current context), it's quite workable. Since it's obviously broken, there's not a lot else we can do until it's fixed, at which point I'll remove my double encoding. Ugly but workable beats just plain broken every time. There is no problem with CGIs - they should expect their parameters to be URL encoded - I know PHP automatically decodes all parameters, however, bear in mind that that means that it may corrupt input data because it doesn't know that mod_rewrite has already done a decoding pass. If this double decoding has not affected you (e.g. because your input strings don't contain %), then you're just lucky.
Created attachment 17573 [details] Escape internal redirects for 2.0.55 This patch escapes internal redirect requests (just before the message "internal redirect with..."). This is logical as it seems that the redirection is fully processed again (unescaped and so on).
Example of successful rewrite for URL http://karel.oldium.home/%/ into http://karel.oldium.net/%/ Rule: RewriteCond %{HTTP_HOST} ^karel.oldium.home$ RewriteRule (.*) http://karel.oldium.net/$1 [last] Log: ...(4) RewriteCond: input='karel.oldium.home' pattern='^karel.oldium.home$' => matched ...(2) [per-dir /var/www/oldium.home/www/] rewrite %/ -> http://karel.oldium.net/%/ ...(2) [per-dir /var/www/oldium.home/www/] implicitly forcing redirect (rc=302) with http://karel.oldium.net/%/ ...(2) [per-dir /var/www/oldium.home/www/] trying to replace prefix /var/www/oldium.home/www/ with / ...(1) [per-dir /var/www/oldium.home/www/] escaping http://karel.oldium.net/%/ for redirect ...(1) [per-dir /var/www/oldium.home/www/] redirect to http://karel.oldium.net/%25/ [REDIRECT/302] This is a correct redirect. The response from Apache to my browser is: HTTP/1.1 302 Found Date: Thu, 02 Feb 2006 16:51:23 GMT Server: Apache Location: http://karel.oldium.net/%25/ ... <p>The document has moved <a href="http://karel.oldium.net/%25/">here</a>.</p> ...
Example of local rewrite for URL http://karel.oldium.home/%/ to local /karel/%/ Rule: RewriteCond %{HTTP_HOST} ^karel.oldium.home$ RewriteRule (.*) /karel/$1 [last] Log: ...(4) RewriteCond: input='karel.oldium.home' pattern='^karel.oldium.home$' => matched ...(2) [per-dir /var/www/oldium.home/www/] rewrite %/ -> /karel/%/ ...(2) [per-dir /var/www/oldium.home/www/] trying to replace prefix /var/www/oldium.home/www/ with / ...(1) [per-dir /var/www/oldium.home/www/] escaping /karel/%/ for redirect ...(1) [per-dir /var/www/oldium.home/www/] internal redirect with /karel/%25/ [INTERNAL REDIRECT] .../redir#1] (3) [per-dir /var/www/oldium.home/www/] add path info postfix: /var/www/oldium.home/www/karel/% -> /var/www/oldium.home/www/karel/%/ .../redir#1] (3) [per-dir /var/www/oldium.home/www/] strip per-dir prefix: /var/www/oldium.home/www/karel/%/ -> karel/%/ ... continues ... The percent sign is handled correctly.
Hi there. Just spent an hour or so looking at the mod_rewrite source. Unfortunately it looks like apache passes the module the path/filename part of the url as already unescaped. There is a workaround to reverse the unescaping, but you still can't use '/' (%2F) because it is already decoded by the time mod_rewrite gets it, and there's no way to know whether it was escaped or not in the original url. I hacked together a messy fixurl(...) function to re-encode '=', '&', '#' etc., then applied that to the uri variable in function int apply_rewrite_rule(...) Just before rc = (ap_regexec(regexp, uri, AP_MAX_REG_MATCH, regmatch, 0) == 0); if (! (( rc && !(p->flags & RULEFLAG_NOTMATCH)) || (!rc && (p->flags & RULEFLAG_NOTMATCH)) ) ) { return 0; } Like I say my code is a hack, so I'll leave it up to someone else to provide a better fix/patch.
I'm showing the same problem with 2.0.55. The following fails: RewriteRule ^/a/(.+)$ http://www.example.com/b/$1 [R,L] This causes the URL: http://localhost/a/where%3f/get?id=1 To be mapped to: http://www.example.com/b/where?/get instead of: http://www.example.com/b/where%3f/get?id=1
(In reply to comment #15) > RewriteRule ^/a/(.+)$ http://www.example.com/b/$1 [R,L] > > This causes the URL: > > http://localhost/a/where%3f/get?id=1 > > To be mapped to: > > http://www.example.com/b/where?/get > > instead of: > > http://www.example.com/b/where%3f/get?id=1 RewriteMap esc int:escape RewriteRule ^/a/(.+)$ http://www.example.com/b/${esc:$1} [R,L,NE] Given URL?QS, core unescapes URL but not QS, and rewrite escapes both the URL and QS that it gets. So [NE] prevents rewrite doing the escaping, and the RewriteMap causes it to escape the URL it gets, but not the QS it gets. I'm not sure it's *right*, but it seems to work for me, up to and including 2.0.59.
(In reply to comment #16) > So [NE] prevents rewrite doing the escaping, and > the RewriteMap causes it to escape the URL it gets, but not the QS it gets. > > I'm not sure it's *right* Yes, I think so, but this is another problem not directly related to the problem described here (rewriting rule-pattern to query string). The current behavior is imho wrong. If we force a redirect, the query string should remain untouched from any escaping intended for uri-paths, because this modifies the query string in an unexpected way. A qs like 'foo%26bar' (location header /foo?foo%2526bar) results in foo%2526bar, which isn't equal to the original query string any more, while a uri like /foo%bar (location header: /foo%25bar) results in /foo%bar. But some specific characters within the query string must be escaped, though (such as spaces). (In reply to comment #5) > 'noescape|NE' (no URI escaping of output) Yes, the NE flag prevents that, but the uri-path may be invalid now since this prevents the URL-path being escaped, too. (In reply to comment #11) > Escape internal redirects for 2.0.55 > > This patch escapes internal redirect requests (just before the message > "internal redirect with..."). This is logical as it seems that the redirection > is fully processed again (unescaped and so on). Yes, this would be logical. The main difference is that this doesn't touch the query string.
This bug is a killer for me using PHP and it`s URLENCODE function. Basically this encodes a space as a literal '+' in the url and escapes a literal '+' as %2b, the problem is that once we hit the RewriteRule the space is still encoded as a literal '+' and the literal '%2b' is decoded to be a literal '+' aswell. As you can imagine the RewriteMap solution dosen`t work and I`m left with no solution but to double encode which is horrible. Is there a reason that one must decode the hex entities before the use of the RewriteRules and is it due to the 'being a path' way of thinking as alot more URLs are not only used as a path to a resource but to pass information aswell. This is what i`d like to see: # accept a-zA-Z and %2b(escaped '+') RewriteRule ^resource/([a-z]|%2b)+$ /resource.ext?data=$1 [NC] This would still fail on say '/resource/info%' as it`s not the sequence %2b etc and would use the first matching rule for something like: RewriteRule ^resource/([a-z]|%2|%2b)+$ /resource.ext?data=$1 [NC] '/resource/%2'. I`d love to hear everyone's opinion on this as I`m not sure if it would be the correct way to handle it or if it would lead to security concerns etc, If there is agreement I`ll have a stab at implementing it and see where it leads, if it is fundamentaly wrong and you have some resources I would love to know that too. Thanks Michael
(In reply to comment #18) > Basically this encodes a space as a literal '+' in the url and escapes a literal > '+' as %2b, the problem is that once we hit the RewriteRule the space is still > encoded as a literal '+' and the literal '%2b' is decoded to be a literal '+' > aswell. Rewriting URI-paths into the QueryString isn't safe - both have different rules for encoding. This creates problems, if you mix both together. But you can process $_SERVER['REQUEST_URI'] within php or catch the variables from the unprased ENV THE_REQUEST with a RewriteCond.
I think this bug describes the same problem as 23295.
You need to use the [NE] (NoEscape) flag in order to disable this escaping behavior.
I'm sorry. I didn't read the entire history of the bug. Reopening on the chance that someone else knows more about this than I do.
From reading bug 23295, I would say that it is related but not the same problem. In this case, the problem is that in this case, the escaping is done too much (as in escaping characters in the query string) while in the bug 23295 case, the escaping was not enough (as in the URI part). (Or, if you use NE, part of the URI is escaped but then part is not and we once again get failure)
First: wow, I didn't know this bug was still open... (In reply to comment #19) > But you can process $_SERVER['REQUEST_URI'] within php or catch the variables > from the unprased ENV THE_REQUEST with a RewriteCond. Yeah, this is the method I went with in the end... and Mediawiki does the same thing for those that are curious. As for the correct behaviour... well, as has been mentioned, the rules for escaping are slightly different between the path and query part of the url. say you have RewriteRule ^(.*)$ pages/test.php?s=$1 [L] If you use the url /1&a=1 or /1%26a%3d1 the resulting internal url ends up as pages/test.php?s=1&a=1, in other words mod_rewrite is parsing the path part after it has been decoded, and then doing a direct copy into the query, without re-encoding it. So the question is, should mod_rewrite parse the urls before or after url decoding (maybe apache decodes before the url is passed to mod_rewrite?), and should it re-encode data when it is copied to the query, or leave that up to the script?
(In reply to comment #24) > So the question is, should mod_rewrite parse the urls before or after url > decoding (maybe apache decodes before the url is passed to mod_rewrite?), You might want to read http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12 where I tried to explain how mod_rewrite's processing within the directory context works.
Thanks for all the input, now knowing how to get around this and what is the likely reason has helped me out of my deranged hysteria for another day. I`d like to ask though does anyone have a pointer to some information as to why this ambiguous behavour is implememnted e.g what security concerns are there for paths etc as this all has me wondering about the validity of using hex coded entities in a SEF style URL (are there other uses that require said query string/path mangling ?). Keep up the great work! Michael
Created attachment 20217 [details] Adds escaping-functionality to backreferences This patch adds a new flag to RewriteRule statement: Adding the flag [B] (or [backrefescaping]) forces mod_rewrite to escape backreferences in the rewrite target. E.g. RewriteRule ^(.*)$ index.php?show=$1 [B,L] In the given example, a request to http://example.com/C++ (or http://example.com/C%2B%2B) would be redirected internally to index.php?show=C%2B%2B instead of index.php?show=C++
Just to put my work around for my PHP problem with amiguous escaping with '+' signs in a rewrite rule here so someone might find it useful, thanks to Bob Ionescu and Mike Weller for their leads. // in the .htaccess file or vhost // accept letters, plus signs and encoded plus signs RewriteCond %{THE_REQUEST} /test/(([a-z]|%2b|\+|)+)*/? [NC] RewriteRule . test.php?cat=%1 [NE,L] // php code for test.php <? print_r($_GET); ?> // URL with encoded spaces which are +'s www.domain.com/test/c++stuff // gives array( [cat] => c stuff) // multiple plus signs are decoded to 1 space by PHP // URL with encoded +'s www.domain.com/test/c%2b%2bstuff // gives array( [cat] => c++stuff ) // correctly decodes an encoded + Hope this helps someone.
Some of the comments here seem to suggest that all this is the expected behavior. Well, I, for one, don't get it. Allow me to elaborate on my experience with this bug. On my site, I direct searches through /search/ followed by the search query, and then another trailing slash. The rule I use is RewriteRule ^search(/(.+))?/$ /index.php?page=search&query=$2 [L] This works fine for queries that don't contain a slash. If I were to search for "9/10", for example, the requested path would become /search/9%2F10/ Now, from the discussion here, I gather that that won't work. And indeed, double encoding fixes it. I hate that solution, personally, but, more importantly: 1. The error I get is a 404. First of all, the content of that error page has the %2F decoded to /, which I don't fully get. But the really weird thing is that my ErrorDocument 404 applies in all cases except this one--I get a standard black on white 404 page for some reason. 2. Since escaping is the problem, allegedly, it should work if I omit the query from the eventual URL, right? However, even the trivial RewriteRule ^search /index.php [L] fails to match /search/9%2F10/. I realize this is not a help forum, but I would sure appreciate some input. I'm sorry if this is a different bug, but it seemed related.
Getting 404 for %2F indicates that you need to look at the AllowEncodedSlashes directive. (I have nothing to say about other issues reported here.)
I've spending the whole day debugging mod_rewrite and finally found this bug. Has it been fixed and incorporated into the latest release? It's quite annoying to have such a bug in the most widely used HTTP server on the net.
*** Bug 39746 has been marked as a duplicate of this bug. ***
Patch from comment #27 committed to /trunk/ in r573831
(In reply to comment #33) > Patch from comment #27 committed to /trunk/ in r573831 It's taken over 2 years for this to be resolved. The power of open source, eh?
(In reply to comment #34) > (In reply to comment #33) > > Patch from comment #27 committed to /trunk/ in r573831 > > It's taken over 2 years for this to be resolved. The power of open source, eh? The patch has been around for longer. Your option to fix it yourself, or pay someone to fix it, or work around it, has always been around.
*** Bug 23295 has been marked as a duplicate of this bug. ***
I'm not sure if people are succesfully convincing other people this IS a bug. Let's take another example: Incoming URL: /foo?bar=%3Abaz This is a perfectly legal URL right? The "%3A" is a perfectly legally encoded "/" char---that is the way it OUGHT to be included. Now let's say I want to redirect all /foo urls to an external server: RewriteRule /foo http://somewhere.else.com/other [R] Expected behavior, redirect to: http://somewhere.else.com/other?bar=%3Abaz Yes? ACTUAL behavior, redirect to: http://somewhere.else.com/other?bar=%253Abaz Some of you are arguing that this is intended behavior? How can this possibly be? I got a perfectly legal URL in with a perfectly legal query string. My RewriteRule should be expected to leave the query string exacty intact, right? Yet it corrupts it to mean something else. To me, this is obviously a bug. [And one that's causing me a serious probelm at the moment to boot].
I know that I don't think this is correct behavior and that some of the discussion here seems to have missed the point of the rewrite problem that I initially reported.
(In reply to comment #35) > (In reply to comment #34) > > (In reply to comment #33) > > > Patch from comment #27 committed to /trunk/ in r573831 > > > > It's taken over 2 years for this to be resolved. The power of open source, eh? > > The patch has been around for longer. > > Your option to fix it yourself, or pay someone to fix it, or work around it, has > always been around. As Jonathan Rochkind stated above: Incoming URL: /foo?bar=%3Abaz Expected behavior, redirect to: http://somewhere.else.com/other?bar=%3Abaz ACTUAL behavior, redirect to: http://somewhere.else.com/other?bar=%253Abaz I think mod_rewrite should not reencode unless I tell it to, or at the very least, let me tell it not to. If a patch exists and has not been released then what kind of money are we talking here to get it fixed in a major relase? I don't have a job but i'd be willing to put a few dollars towards getting this fixed.
(In reply to comment #39) > I think mod_rewrite should not reencode unless I tell it to, or at the very > least, let me tell it not to. I quite agree. Those that say that it's correct should be campaigning for a documentation change saying "it's not possible to pass URL-unsafe parameters (i.e. those that require urlencoding) through mod_rewrite". I suspect that the vast majority of tutorials, documentation and articles about mod_rewrite are broken by this bug - the only reason they work as they are is pure luck and simplistic examples. > If a patch exists and has not been released then what kind of money are we > talking here to get it fixed in a major relase? I don't have a job but i'd be > willing to put a few dollars towards getting this fixed. Me too. Without a patch the choices are : don't use apache, don't use mod_rewrite or (shiver) double encode everything. I have another workaround that's workable at the moment - instead of urlencoding params, I base64- encode them. Really ugly, but it works.
(In reply to comment #18) > This bug is a killer for me using PHP and it`s URLENCODE function. > Basically this encodes a space as a literal '+' in the url and escapes a literal > '+' as %2b There's an easy workaround for that - use rawurlencode() instead which encodes spaces as %20 instead of +. It will still suffer from this bug if the string contains any params that get rewritten.
Folks - we know all about this bug, and it still needs someone to find time to tidy up the patch. See http://marc.info/?t=118925575100001&r=1&w=2 for why the existing patch isn't considered quite good enough.
Awesome, thanks. Reassuring.
Fixed in r589615
*** Bug 42610 has been marked as a duplicate of this bug. ***
This bug doesn't seem to be fixed after all. See https://issues.apache.org/bugzilla/show_bug.cgi?id=45529
Log now confirms the bug is still there - see again bug 45529
Here is the application of above php solution, if your web host still has an Apache with this bug. (The bug where an external htaccess redirect double encodes url parameters) in .htaccess: RewriteCond ... RewriteRule .... /phplist_redirect10.php [L] in the file phplist_redirect10.php: <?php header( 'Location: http://' . $_SERVER['SERVER_NAME'] . preg_replace ( '/^\/myfolder([\-_a-zA-Z0-9]+)\/(.*)$/' ,'/myfolder/$1/$2' ,$_SERVER['REQUEST_URI'] ) ,TRUE ,301 //301 for permanent redirect, 303 for temporary redirect ); $_SERVER['REQUEST_URI']: the url as written in browser bar, contains first slash. like /myfolder/myfolder/some.php?a=5#bcd
Per last few comments, this is still a problem. Comment 39 puts it well: > I think mod_rewrite should not reencode unless I tell it > to, or at the very least, let me tell it not to. Until the "let me tell it not to" is implemented, this needs to stay open. I'm running across this double-encoding problem on a proxied Perl app.
[NE] is required to not escape the substutition, whether you include characters that need escaping in-line or via a backreference. Additionally, query strings that aren't modified are no longer escaped in 2.3 [this is one of the followup bug reports that should have been a separate bug] Please open separate bugs for separate rewrite issues if you'd like them reconsidered. I'd suggest even if you want to re-open this bug, you instead open a new bug with less baggage.
changing disposition to WORKSFORSOME, bug too muddled for a proper closing code.