Testcase: -> RewriteRules in per-dir context, /.htaccess (root) -> used RewriteRules: RewriteEngine on RewriteCond %{REQUEST_URI} !^/(test|robots\.txt) RewriteRule ^(.*) test/$1 [C] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^test/(.*)$ /test/index.php?q=$1 [L,QSA] Description: Eyery Request should go to /test/... and furthermore if the file does not exist in /test/..., it should go to /test/index.php?q=... -> Request: /CSS/style.css -> expected result: /test/index.php?q=CSS/style.css (all cond. are true) -> real result: /test/index.php?q=CSS/style.css/style.css Problem: see attached Rewritelog The 1st RewriteRule rewrites the request to the filepath test/CSS/style.css while the 2nd RewriteRule matches now against test/CSS/style.css/style.css and not against the expected filepath test/CSS/style.css, because the "/style.css"-part is handled as path info, which is added again to the new filepath. I would expect once a substitution occurs, path info from the original request isn't added anymore in this round of processing.
Created attachment 17689 [details] Rewritelog
A possible second example for verification purposes: In per-directory .htaccess: RewriteRule ^(.*)/index\.html$ $1/ The RewriteLog I see (stripped down and paraphrased for clarity): (3) add path info postfix: /document/root/foo -> /document/root/foo/index.html (3) strip per-dir prefix: /document/root/foo/index.html -> foo/index.html (3) applying pattern '^(.*)/index\.html$' to uri 'foo/index.html' (2) rewrite foo/index.html -> foo/ (3) add per-dir prefix: foo/ -> /document/root/foo/ (3) add path info postfix: /document/root/foo/ -> /document/root/foo//index.html [...] The expected behavior is that 'foo/index.html' would be rewritten to 'foo/', however the "add path info postfix" step just adds it right back on.
*** Bug 33022 has been marked as a duplicate of this bug. ***
Created attachment 18910 [details] Rush patch used some time ago against 2.0.55 I didn't port it into trunk since I hope that there's another method instead of using r->notes.
*** Bug 41423 has been marked as a duplicate of this bug. ***
This looks very similar to the old bug http://archive.apache.org/gnats/7879.
I'm still experiencing this bug in 2.2.6.
*** Bug 44922 has been marked as a duplicate of this bug. ***
I just wasted a good amount of time re-discovering and working around this bug (seven years after it was initially reported?!), as apparently many others continue to do (since it keeps popping up as a new bug) and I felt I should add my voice to the call to fix it, lest it continue to be neglected. Why hasn't the mod_rewrite team fixed such a simple bug after so long? Why hasn't anyone even taken ownership of this bug (still has status NEW after 2.5 years on bugzilla?) Is there anyone currently working on mod_rewrite, anyone at all? I don't think the masses of Apache users such as myself should have to apply a patch that's been out for more than two years - I'm not familiar enough with the code to immediately see the non-idealities of the patch, but if the patch is "good enough", please just roll it out and worry about perfecting it later. Your users will thank you.
While I agree totally with Seth in that this bug should have been fixed long ago, we cannot simply rush a patch through. This needs to be done properly. Because if there are any problems with the patch, who knows how long *they* will take to be addressed? For the purpose of helping anyone who might fix this bug, I will supply some of the details that I provided when I reported the bug. The problem: If multiple RewriteRules within a .htaccess file match, unwanted copies of PATH_INFO may accumulate at the end of the URI. In more depth: When you make a request for a file in a directory that doesn't exist, Apache divides the URI into the "real" part (r->filename) and the "virtual" part (r->path_info). This has happened by the time per-directory (.htaccess) RewriteRules are ready to be processed, which means mod_rewrite has to put them back together to reconstruct the original URI. This is what happens. r->path_info is appended to ctx->uri prior to *each* RewriteRule. If a RewriteRule (including its RewriteConds) does not match, ctx->uri is discarded and nothing bad happens. If a RewriteRule does match, however, the entire substitution is incorporated into r->filename. But r->path_info is not changed! This means that subsequent rules will get an extra copy of PATH_INFO. If more rules match then this can get worse. Note that PATH_INFO *should* be appended before the first matching RewriteRule, because it forms part of the URI. However, afterwards it should not be appended again. Example: This comes from a .htaccess file placed in DocumentRoot. It is supposed to replace all underscores in a URI with hyphens. RewriteEngine On RewriteBase / RewriteRule ^(.*)_(.*)$ $1-$2 [N] Make a request for "/_f_o_o_" and it will be correctly rewritten to "/-f-o-o-". (That's because PATH_INFO is empty.) Make a request for "/_f_o_o_/bar" and it will be rewritten to "/-f-o-o-/bar/bar/bar/bar". (That is, unless you happen to have a _f_o_o_ directory, in which case PATH_INFO will be empty and the rewriting will work as desired.) Note that there are four underscores but only three erroneous copies of PATH_INFO - this is because the first time the rule matches, appending PATH_INFO is correct behaviour. Make a request for "/foo/b_ar" and an infinite loop will ensue, since every time an underscore is replaced, a new one will be appended prior to the next rule. (See my bug report at https://issues.apache.org/bugzilla/show_bug.cgi?id=44922 for a RewriteLog.) The current patch: Looks pretty good to me. It basically uses a flag to indicate whether a substitution has been made yet, and clears this flag when it ought to get "reset". I would however check perdir *before* making each calls to apr_table_xxx, as these calls are relatively slow. The flag is stored in r->notes - I can't see where better to put it. It can't go in the ctx struct because the scope of that struct is a single rewrite list. (This is bad because multiple .htaccess files can match a single request.) Although for performance, a "cache" of the flag could be stored in the ctx struct, and saved to r->notes in between rewrite lists.
> The flag is stored in r->notes - I can't see where better to put it. It can't > go in the ctx struct because the scope of that struct is a single rewrite list. > (This is bad because multiple .htaccess files can match a single request.) > Although for performance, a "cache" of the flag could be stored in the ctx > struct, and saved to r->notes in between rewrite lists. I am leaning towards whacking r->path_info as soon as we know we've effectively made it useless by replacing r->filename in per-directory context, rather then thinking harder about when we should spill r->path_info into ctx->uri by saving info away in some previous step. @@ -3981,6 +3986,11 @@ /* Now adjust API's knowledge about r->filename and r->args */ r->filename = newuri; + + if (ctx->perdir && r->path_info && !(p->flags & RULEFLAG_KEEPPATHINFO)) { + r->path_info = '\0'; + } But I don't pretend to understand how someone might be relying on this path_info from the original request floating around, so the meaning of the flag might have to be inverted so that the current behavior is default. Who knows how many .htaccess files are carefully working around this behavior by e.g. not propogating pieces of the URL that look like the existing %{PATH_INFO} over in the substitution
>> I am leaning towards whacking r->path_info as soon as we know we've effectively made it useless by replacing r->filename in per-directory context, rather then thinking harder about when we should spill r->path_info into ctx->uri by saving info away in some previous step. That's exactly what I thought at first. But then I found this: http://213.11.80.10/manual/cgi_path.html And I quote: "Apache 1.2 and later now determine SCRIPT_NAME and PATH_INFO by looking directly at the URL" So maybe it's not such a good idea.
(In reply to comment #12) > >> I am leaning towards whacking r->path_info as soon as we know we've effectively made it useless by replacing r->filename in per-directory context, rather then thinking harder about when we should spill r->path_info into ctx->uri by saving info away in some previous step. > > That's exactly what I thought at first. But then I found this: > http://213.11.80.10/manual/cgi_path.html > > And I quote: "Apache 1.2 and later now determine SCRIPT_NAME and PATH_INFO by > looking directly at the URL" > > So maybe it's not such a good idea. > I thought that since this is per-directory, PT-like behavior is implied so whenever we change the filename and re-inject into apache, so PATH_INFO is very soon to be "fixed" by the core. The only time it lasts longer is when we have another rule or two to look at before re-injecting.
I'm encouraged by the recent activity on this bug, but with all the discussion about possible approaches and their potential ramifications, and the link to the reference describing version 1.2 and later behavior, I'm left wondering: isn't there a set of unit test cases for this function to unequivocally define its requirements, and indicate when an acceptable solution has been found? This seems like a project that would have a large, if not exhaustive, stable of test cases built up by now.
(In reply to comment #14) > isn't there a set of unit test cases for this function to unequivocally define > its requirements, and indicate when an acceptable solution has been found? Nothing that can be relied upon for this purpose. Any current functional regression tests pass today, and we wouldn't pretend to define the requirements for something that could be relied on by any number of .htaccess files for such a long span of time. There is also the issue of interaction with what third-party modules or CGI might expect.
(In reply to comment #13) > I thought that since this is per-directory, PT-like behavior is implied so > whenever we change the filename and re-inject into apache, so PATH_INFO is very > soon to be "fixed" by the core. > > The only time it lasts longer is when we have another rule or two to look at > before re-injecting. I could be wrong, but the way I understand it, PATH_INFO is just derived from the original URL - before any rewriting occurs. And is never "fixed".
(In reply to comment #16) > (In reply to comment #13) > > I thought that since this is per-directory, PT-like behavior is implied so > > whenever we change the filename and re-inject into apache, so PATH_INFO is very > > soon to be "fixed" by the core. > > > > The only time it lasts longer is when we have another rule or two to look at > > before re-injecting. > > I could be wrong, but the way I understand it, PATH_INFO is just derived from > the original URL - before any rewriting occurs. And is never "fixed". Yes, derived from the physical filename after the request has been mapped to the file system. I think Eric meant that it is fixed in the redirect processing (new file system mapping, new path info for the internal redirect processing). I think erasing path_info is not a good idea. Even if it seems to be useless (since you can check the URL-path), someone (either content handlers or e.g. RewriteConditions) may relay on that ENV. That's why I've chosen to leave it untouched but prevent further appending if a substitution occurred. I cannot imagine a case where "not appending path_info after a subst. was applied but leaving r->path_info untouched" can break any case – either at server/rewrite rule side nor at script side. No one uses a construct of rewrite rules, which expects continuing appending of path_info – and I've seen a lot of rules in a german discussion forum about mod_rewrite. It's the opposite. There were questions "what went wrong" when it comes to the behavior described in this PR. (In reply to comment #10) > I would however check perdir *before* making each calls to apr_table_xxx, as > these calls are relatively slow. As I wrote – rush. :-) I think most calls were made in sections which are touched by the fix-up hook (i.e. directory context) only. May be a check for path_info != NULL would be better.
> for something that could be relied on by any number of .htaccess files for such > a long span of time. I doubt many .htaccess rely on [that is, work-around] current behaviour, because: -The behaviour differs from httpd.conf behaviour, -is undocumented, and -when I posted about this to the (English) Apache users mailing list, nobody knew about the issue. > As I wrote – rush. :-) I think most calls were made in sections which are > touched by the fix-up hook (i.e. directory context) only. May be a check for > path_info != NULL would be better. Hmm, you're right for the most part. I was thinking you should have done: if (perdir != NULL && apr_table_get(r->notes,"substapplied") && instead of if (apr_table_get(r->notes,"substapplied") && perdir != NULL && except the latest code is structured a bit differently (the whole block already has a perdir check) so this is irrelevant.
*** Bug 46165 has been marked as a duplicate of this bug. ***
I committed an explicit flag to discard PATH_INFO during per-directory processing: http://svn.apache.org/viewvc?rev=728015&view=rev I'm a bit too concerned about breaking convoluted rulesets by changing the default behavior. I did consider changing the default in trunk/2.3.x and having an opt _out_ flag but then the 2.2.x/2.3.x differences would be a pain. I would appreciate review of the doc in the commit (in viewsvn or URL when it's available) because it's difficult to find the right level of detail: http://httpd.apache.org/docs/2.3/mod/mod_rewrite.html#rewriteflags
Another flag (to fix just a bug) makes mod_rewrite yet more complex (which is IMHO a problem). It is good that there is a way to avoid the bug, but as you can see from the last duplicate of this bug, I think many people trapping into this bug are not aware about what causes the problem or what path-info is at all. Further more, many don't have access to the server config and can't get a Rewritelog. I still consider this as a bug and not as a feature, which means that a bug shouldn't need options to keep it. There's a small typo: "or the resulf". If it should be kept that way, consider adding a note to the N-flag about the new flag as well, because this bug is deadly in combination with the N-flag.
*** Bug 40598 has been marked as a duplicate of this bug. ***
In regards to clobbering PATH_INFO: From my experiments, it would seem that PATH_INFO is in fact "regenerated" after a mod_rewrite ruleset has been processed. I'm not quite sure at what stage this occurs, but this would mean that, at least, CGI scripts will not be affected by the change. The only place I can think of that will be affected is RewriteRules and RewriteConds that use %{PATH_INFO} themselves. But I would say they are relying on undocumented behaviour - in any ruleset that allows more than one substitution to occur, the URL goes through a "partial" state and what PATH_INFO should contain at that point isn't really stated anywhere. So I say forget the flag and change default behaviour - documenting it clearly, of course. If someone really wants to preserve PATH_INFO, they can use the env= flag to store it in another variable before making any substitutions. In terms of the code itself: > r->path_info = "\0"; Apache will automatically clean up the old string when the request is done, right? (Just sanity checking here.) Also, why not just use r->path_info = NULL; ? All the rest of the mod_rewrite code supports this. If you have a good reason not to use NULL, then you could just use this: r->path_info = "";
(In reply to comment #23) > In regards to clobbering PATH_INFO: > > From my experiments, it would seem that PATH_INFO is in fact "regenerated" > after a mod_rewrite ruleset has been processed. I'm not quite sure at what > stage this occurs, but this would mean that, at least, CGI scripts will not be > affected by the change. This is the core directory walk processing that happens shortly after the rewritten URL is reinserted. > > So I say forget the flag and change default behaviour - documenting it clearly, > of course. I think that definitely doesn't fly for 2.2 because I think it's quite easy to end up with a ruleset that works under Changing the behavior between releases has complications too because you have to then know where your htaccess is running. Ultimately I'm just more comfortable A) not regressing anyone with a legacy ruleset lurking out in htaccess and B) making the choice explicit in the rule after consulting the doc. > Apache will automatically clean up the old string when the request is done, > right? (Just sanity checking here.) yes, everything in the request_rec is allocated out of a short lived (lifetime of a request) APR pool. > > Also, why not just use thanks, corrected.
> Ultimately I'm just more comfortable A) not regressing anyone with a legacy > ruleset lurking out in htaccess and B) making the choice explicit in the rule > after consulting the doc. I can see where you're coming from, but in my opinion, placing extra burden on users for backwards-compatibility reasons should be avoided as much as possible. I've had an idea: rather than using r->path_info, we add a path_info member to rewrite_ctx, and refer to that within mod_rewrite. It would be initialised to r->path_info, but could be set to NULL when a substitution is made. This way the request's PATH_INFO information is not actually modified by mod_rewrite. Any comments? Earlier I said > The flag is stored in r->notes - I can't see where better to put it. It can't > go in the ctx struct because the scope of that struct is a single rewrite list. I no longer think this is true. r->path_info is regenerated at the end of a rewrite list (providing at least one rule matched) so we can start afresh for each rewrite list.
How I can to use this path on windows version of apache 2.2? When this bug will be fixed in binaries?
Sorry. Needs use DPI flag to fix this like.
marking fixed, Release in 2.2.12
This is not fixed, because the fix is hiding behind a `DPI` flag. It has just wasted several hours of my life, and presumably countless others', because it is not properly fixed. Please make the correct behaviour the default. I cannot see why anyone would want the broken behaviour, but that would be the behaviour to have a flag to enable.
To expand on this, the bug is subtle, as it only occurs when part of the path matches a file or directory, thus many users will have this latent bug if they are not using the DPI flag and have not tested that specific case.
Please open a new bug if you want to suggest a default change for a future release. This can't safely be changed in a stable release.