This patch will add ampersand escaping to apache2 as recently posted to the dev@httpd list. example use (from URL above): RewriteMap ampescape int:ampescape RewriteRule ^/(.*)$ /index.php?title=${ampescape:$1} [L,QSA] regards, Christian Parpart.
Created attachment 13507 [details] adds the ampescape function and here the patch
Created attachment 13508 [details] adds the ampescape function * adapted patch to ASF's coding style * the old patch was against 2.0.52, this patch is against HEAD
Looks good here, and this has been added as a default patch in Gentoo.
too special, as discussed on dev (long time ago). So I'm still -1 on it.
André Malo, maybe you have the time for writing the proposed more-generic extension? Because I (in my case) actually don't have it :(
I have to agree with Andrés -1 - we could be adding a bunch of these to handle all sorts of special cases and that hardly makes sense. If anything, we need a way to do something like the unix tr, not a special case "hack".
Could someone tell me, what the problem (?) described on that url has to do with the patch? The "obvious" rewriterule there is just plain wrong: RewriteRule ^/(.*)\?(.*)$ /index.php?title=$1&$2 [L] RewriteRules don't match the querystring. Period. There's no known issue about it. The obvious rule would be: RewriteRule ^/(.*) /index.php?title=$1 [L,QSA] What am I missing?
The problem with rewriting /(.*) to /index.php?title=$1 is that $1 containing & would not escaped correctly, even if the user's URL had escaped & to %26. For example, /AT%26T would be rewritten to /index.php?title=AT&T instead of /index.php?title=AT%26T - causing title to only contain 'AT' instead of the expected 'AT&T'. I think this patch is important even though it is too special because & is a important character in query strings - just as / is a very important character in path strings - it is quite possible that this case would more often with other web applications if people made more use of mod_rewrite.
From the latest patch: unsigned char *copy = (char *)apr_palloc(r->pool, 3 * strlen(key) + 3); shouldn't that be char *copy = (char *)apr_palloc(r->pool, 3 * strlen(key) + 3); since your doing a cast to (char *) instead to (unsigned char *) _and_ since the function returns char * instead of unsigned char * as per its definition?
yeah, makes sense in any way, however, there are more "unsigned" that might be eliminated then. Some (longer) time ago, httpd-dev mailinglist members recommented in writing a MORE GENERIC variant of this patch, I can't remember exactly, however, it should be done anyway in order to get something like this functionality in. (I'm still not that familar with this kinda apache API anyway :(
The same problem occurs with # (%23) and is even more destructive there: RewriteRule ^/(.*) /index.php?title=$1&something=else /Foo%#23Bar will get rewritten to: /index.php?title=Foo#Bar&something=else The 'Bar&something=else' is interpreted as a fragment identifier (i.e. page anchor) and ignored on the server side. The proposed patch is pretty short-sighted because it only treats one symptom, not the cause. Why does mod rewrite need to unescape these characters in the first place? Special characters like & and # do not mean the same as %26 and %23 within in the context of an URL. By unescaping, this information is being lost... At the very least, this unescaping should be optional. I think you can fix most issues by just using the 'escape' RewriteMap on the substitute, but this is far from practical as it needs to be set globally for the entire server. This rules it out for hosted environments where usually the most you get is .htaccess. Is there any reason why the built-in map functions (toupper, tolower, escape, unescape) still need a very redundant RewriteMap directive? So I guess the optimal solution would either: - Allow you to turn off this automatic unescaping with a rewriterule flag (or similar) in htaccess - or Allow you to use the built-in map functions directly without requiring those redundant RewriteMap directives
(In reply to comment #11) > Why does mod rewrite need to unescape these characters in the first place? Special characters like & > and # do not mean the same as %26 and %23 within in the context of an URL. By unescaping, this > information is being lost... At the early beginning, when the internal request processing starts, apache unescapes the URL-path once. This is not done by mod_rewrite, this happens before mod_rewrite is involved and I think this is also a part of the security concept. If you are using your rewrite rules in directory context, you have a filename (a physical path, e.g. /var/www/abc) while the per-dir prefix is stripped (so you're matching only against the local path 'abc' if your rules are stored in /var/www/). How would you map some unescaped URL-path to the file system? There's no way to make the unescaping process optional for a physical path in directory context. URL-path and QueryString have different rules for encoding. The QueryString is left untouched (by browser [except spaces] and server) while reserved and special chars in the URL-path must be requested hex-encoded by the client. Apache unescapes URL-path in order to process the request. A way to soften this problem would be a map function which encodes all non-[a-zA-Z0-9/,._-] characters into their %FF hex representation as discussed above. If you need the unescaped uri with all its consequences, use the ENV THE_REQUEST, which contains the full untouched request string like GET /foo%20bar?foo=bar HTTP/1.1 BTW: You can also analyze $_SERVER['REQUEST_URI'] within your php script and set the variable 'title' there. That would be another workaround for scripts (typo3 is using this method).
*** Bug 39739 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of 23295 ***
This PR is an enhancement request to implement a new internal map function which still needs to be written more-generic.