Bug 44922 - RewriteRules in .htaccess erroneously inject PATH_INFO
Summary: RewriteRules in .htaccess erroneously inject PATH_INFO
Status: RESOLVED DUPLICATE of bug 38642
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_rewrite (show other bugs)
Version: 2.2.8
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-02 06:57 UTC by Aleksander Budzynowski
Modified: 2008-08-14 02:42 UTC (History)
0 users



Attachments
RewriteLog (9.47 KB, text/plain)
2008-05-02 06:57 UTC, Aleksander Budzynowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksander Budzynowski 2008-05-02 06:57:04 UTC
Created attachment 21906 [details]
RewriteLog

The problem, superficially:

If multiple RewriteRules within a .htaccess file match, extra copies of PATH_INFO may accumulate at the end of the URI (depending on whether or not the substitutions include appropriate backreferences to the matched string).

Because there are many configurations that will be somehow affected by this problem, there is no simple cause and effect. I have provided one example below.


In more depth:

This is my best guess from looking at the mod_rewrite source.

When using mod_rewrite in a per-directory context, r->path_info is appended to ctx->uri prior to *each* RewriteRule. If a RewriteRule does not match, this is discarded and has no effect. If a RewriteRule does match, however, the entire substitution is incorporated into r->filename.

This means that subsequent rules will get a "tailgating" copy of PATH_INFO. If more rules match then this can get worse.

Now, each matching rule will either include (and possibly modify) the PATH_INFO portion in the substituion, or will leave it out - in either case it doesn't make sense for further rules to have it re-appended.

This is the code that does the appending:

        if (r->path_info && *r->path_info) {
            rewritelog((r, 3, ctx->perdir, "add path info postfix: %s -> %s%s",
                        ctx->uri, ctx->uri, r->path_info));
            ctx->uri = apr_pstrcat(r->pool, ctx->uri, r->path_info, NULL);
        }

(This agrees with the RewriteLog)


Tested on:

Linux (multiple boxes); Apache versions 2.2.8, 2.2.3, 2.0.61

I think this problem has been reported before (see http://archive.apache.org/gnats/7879 which claims it was fixed in 2.0.30... but it's still happening)


Example:

Here's an example of an affected configuration. This comes from a .htaccess file placed in DocumentRoot. It is supposed to replace all underscores in a URI with hyphens.

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)_(.*)$ $1-$2 [N]

Make a request for "/_f_o_o_" and it will be correctly rewritten to "/-f-o-o-". (That's because PATH_INFO is empty.)

Make a request for "/_f_o_o_/bar" and it will be rewritten to "/-f-o-o-/bar/bar/bar/bar". (That is, unless you happen to have a _f_o_o_ directory, in which case PATH_INFO will be empty and the rewriting will work as desired.)

Note that there are five underscores but only four copies of PATH_INFO - this is because the first time the rule matches, appending PATH_INFO is correct behaviour because r->filename does not include it.

Make a request for "/foo/b_ar" and an infinite loop will ensue, since every time an underscore is replaced, a new one will be appended prior to the next rule.

See the attached RewriteLog, which contains rewrite information for the first two requests.

As I have said, there are many ways to be affected by this problem - the above method may have a simple workaround, but there will be more complex cases where workarounds are difficult.


Thoughts on solution:

Once a substitution is made, PATH_INFO is essentially rendered invalid: it will contain something not consistent with the URI. It would be quite easy to set r->path_info to an empty string if any rule matched, and that should solve the problem. Otherwise, a flag could be added (to rewrite_ctx perhaps) indicating whether or not any rules have matched yet.
Comment 1 Aleksander Budzynowski 2008-08-02 00:43:51 UTC
I know this problem is a bit tricky to get one's head around, but after three months I think it's time to make some noise again.

I think the reason this bug wasn't reported ages ago is because people simply attribute it to the mysteries of mod_rewrite. But this behaviour is beyond mysterious - it's illogical and totally undocumented, and I'm certain that this behaviour was not intended by those who designed mod_rewrite.

Also, the problem only affects .htaccess files. And the people in the best position to spot and report the problem (that is, people who have a decent understanding of mod_rewrite) tend to have access to httpd.conf, and avoid .htaccess files, and so won't notice anyway.


I really don't see any way to USE this behaviour, so correcting it shouldn't break any existing installations. It might break any workarounds that are in place (although if people were aware of the bug one hopes they would report it).

Fixing this bug would, I believe, significantly narrow the divide between httpd.conf and .htaccess.

I'm willing to work on a fix for this myself but I really would like someone else to at least confirm the bug. It should not be hard to reproduce my example.


A slight correction to the problem description:
"Note that there are five underscores but only four copies of PATH_INFO"
should be
"Note that there are four underscores but only three superfluous copies of PATH_INFO"


Thanks,
Aleksander Budzynowski
Comment 2 William A. Rowe Jr. 2008-08-02 00:56:59 UTC
"It's illogical and totally undocumented, and I'm certain that this
behaviour was not intended by those who designed mod_rewrite."

Perhaps you misunderstood mod_rewrite, it is designed to accomplish
everything under the multiverse ;-)  This is after all the swiss army
knife of httpd.

Seriously, commenting only so I go back and review the bug at a future date.
Comment 3 Aleksander Budzynowski 2008-08-02 03:36:05 UTC
"This is after all the swiss army knife of httpd."

And within .htaccess files it is more like a pair of plastic scissors because of this bug.


Let me try to better explain the cause of the problem:

I have only seen it happen during RewriteRules in .htaccess files. This late in the process, if the request is for a file in a non-existent directory (which I might add is the kind of situation you often want to use mod_rewrite for), the path will be split across r->filename and r->path_info.

Clearly the two parts need to be merged before a substitution is made. The current code does this.

After a rule matches, the entire resultant path is saved to r->filename. This renders the contents of r->path_info invalid. However, the code does not do anything about this! If any subsequent rules match, the out-of-date r->path_info will be injected again. Herein lies the problem.
Comment 4 Eric Covener 2008-08-02 05:14:52 UTC
(In reply to comment #3)
> "This is after all the swiss army knife of httpd."
> 
> And within .htaccess files it is more like a pair of plastic scissors because
> of this bug.

The behavior shouldn't be that limiting, a more careful pattern can be used to e.g. not capture PATH_INFO and re-insert it into the substitution.

Comment 5 Aleksander Budzynowski 2008-08-02 16:23:59 UTC
"The behavior shouldn't be that limiting, a more careful pattern can be used to
e.g. not capture PATH_INFO and re-insert it into the substitution."

You undervalue plastic scissors. But you're right in that workarounds exist.

However, once you venture into looping territory (the N flag), these workarounds start getting ugly. You have to allow path_info to be appended just once at the start of the process.


And to even do anything like that, one needs to know about this issue. It's undocumented.
Comment 6 Eric Covener 2008-08-02 17:28:55 UTC
> However, once you venture into looping territory (the N flag), these
> workarounds start getting ugly. You have to allow path_info to be appended just
> once at the start of the process.

The N-flag isn't necessary though, right? You still get the failure in per-directory context by virtue of the per-directory running over and over until there are no changes?

Just want to be sure it's not unique to the tighter loop of the N flag.
Comment 7 Aleksander Budzynowski 2008-08-02 17:41:24 UTC
"The N-flag isn't necessary though, right?"

No, the N-flag only makes workarounds more complicated.

The problem happens if all these conditions are met:

-in per-dir (.htaccess) context
-PATH_INFO contains something
-more than 1 RewriteRule matches
Comment 8 Bob Ionescu 2008-08-14 02:42:02 UTC
This seems to be a duplicate of bug 38642

You may port the ugly patch provided there with r->notes to 2.2.9/trunk, it works at least.

> I think the reason this bug wasn't reported ages ago is because people simply

It was reported in 2006.  ;-) I'll mark this bug as a dupe, because at least a patch is provided in 38642.

*** This bug has been marked as a duplicate of bug 38642 ***