Bug 52000 - RewriteRule documentation unclear/misleading about substitution
Summary: RewriteRule documentation unclear/misleading about substitution
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 2.5-HEAD
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: HTTP Server Documentation List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-10 00:39 UTC by Philippe Cloutier
Modified: 2012-04-26 16:50 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Cloutier 2011-10-10 00:39:11 UTC
The description of RewriteRule at
http://httpd.apache.org/docs/trunk/en/mod/mod_rewrite.html#rewriterule
contains:

The URL is completely replaced by the Substitution and the rewriting process continues until all rules have been applied, or it is explicitly terminated by a L flag, or other flag which implies immediate termination, such as END or F.

This assumes that rules operate on URL-s. In reality, as the section
explains earlier, it may operate on relative filesystem paths or on URL-paths:

What is matched?

In VirtualHost context, The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html").

In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that lead the server to the current RewriteRule (e.g. "app1/index.html" or "index.html" depending on where the directives are defined).


Note that something is missing in "or other flag" (maybe "or another flag").
Comment 1 Rich Bowen 2011-10-10 11:35:08 UTC
I've made another pass at clarifying this. Please let me know what you think.
Comment 2 Philippe Cloutier 2011-10-10 16:27:43 UTC
Thanks Rich. I re-read the section and there is at least one more possibility which is not considered. The rule may be applied to a "URL-path", as explained above:

The Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern. The Substitution may be a:

file-system path
    Designates the location on the file-system of the resource to be delivered to the client.
URL-path
    A DocumentRoot-relative path to the resource to be served. Note that mod_rewrite tries to guess whether you have specified a file-system path or a URL-path by checking to see if the first segment of the path exists at the root of the file-system. For example, if you specify a Substitution string of /www/file.html, then this will be treated as a URL-path unless a directory named www exists at the root or your file-system, in which case it will be treated as a file-system path. If you wish other URL-mapping directives (such as Alias) to be applied to the resulting URL-path, use the [PT] flag as described below.
Absolute URL
    If an absolute URL is specified, mod_rewrite checks to see whether the hostname matches the current host. If it does, the scheme and hostname are stripped out and the resulting path is treated as a URL-path. Otherwise, an external redirect is performed for the given URL. To force an external redirect back to the current host, see the [R] flag below.
Comment 3 Daniel Gruno 2012-04-03 08:25:59 UTC
Pinging on this issue.
From what I can read in the docs, the suggestions have been added - is this observation correct, and if so, can we close this ticket sometime soon?
Comment 4 Daniel Gruno 2012-04-03 13:53:58 UTC
After reviewing the changes, I am closing this ticket as resolved, as I am satisfied that the requests made by the ticket creator has been followed up by appropriate changes in the documentation.
Comment 5 Philippe Cloutier 2012-04-10 16:03:48 UTC
Hi Daniel,
the original problem was partly addressed, but I find that the problem described in comment 2 remains. The section still contains:

The URI or file path (see "What is matched?", above) is completely replaced by the Substitution and the rewriting process continues until all rules have been applied, or it is explicitly terminated by an L flag, or other flag which implies immediate termination, such as END or F.

However, as explained in What is matched?, a URL-path can also be matched:

In VirtualHost context, The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html").

In fact, What is matched? does not mention the possibility of directly matching a URI.
Comment 6 Daniel Gruno 2012-04-11 17:10:13 UTC
So, to sum up comment #2: You would like for us to say URL-path instead of URI, when speaking of what will be replaced? 
I could argue that an URI can be both a URL and a URL-path, but after some discussion with a fellow #httpd staffer, I have been convinced that saying URL-path might just be the better solution.

Awaiting your reply and such.
Comment 7 Philippe Cloutier 2012-04-11 20:00:48 UTC
Daniel, yes.
I also do not see how a URL-path could be a URI. According to RFC 3986 section 1.1.1:

Each URI begins with a scheme name, as defined in Section 3.1, that refers to a specification for assigning identifiers within that scheme.

Therefore, a URL-path, for example "/index.html", is not a URI according to the RFC.
Comment 8 Daniel Gruno 2012-04-11 20:10:38 UTC
Okay, I have made the adjustments to the trunk docs, so they now refer to "URL-path" instead of "URI", and if you're happy with it, I'll make the changes to the 2.4 docs as well.
Comment 9 Philippe Cloutier 2012-04-11 22:04:50 UTC
Thank you Daniel. There is the same problem at the beginning of the section:

Pattern is a perl compatible regular expression. On the first RewriteRule it is applied to the (%-decoded) URL-path of the request; subsequent patterns are applied to the output of the last matched RewriteRule.

The first RewriteRule is matched against a filesystem path in directory context. By the way, I find it slightly misleading to say that the first RewriteRule is *applied* to the pattern, since the pattern is not really modified. I would say it is matched against it.

While we're at it. In:

In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that lead the server to the current RewriteRule (e.g. "app1/index.html" or "index.html" depending on where the directives are defined).

"lead" should read either "led" or "leads".
Comment 10 Daniel Gruno 2012-04-12 07:16:47 UTC
I have made the changes you suggested and even fixed a case of bad grammar.
Anything else, while we're at it, or can we close up this ticket?
Comment 11 Philippe Cloutier 2012-04-13 03:51:23 UTC
Well, this is abusing this report, but if we're at it...

I don't think it is correct to say
On the first RewriteRule, it is matched against the (%-decoded) URL-path (or file-path, depending on the context) of the request.

In per-directory context, we don't necessarily have a file-path if I understand correctly. file-path is defined this way:
The path to a file in the local file-system beginning with the root directory as in /usr/local/apache/htdocs/path/to/file.html. Unless otherwise specified, a file-path which does not begin with a slash will be treated as relative to the ServerRoot.

Suppose we get a request for http://localhost/article/7, which we want to be served by /usr/local/apache/htdocs/articles.php. We define a per-directory rewrite rule in /usr/local/apache/htdocs/.htaccess. That rule will match against "article/7". Yet, there is no /usr/local/apache/article/7. In fact, there is no "7" file anywhere.


Also, http://httpd.apache.org/docs/trunk/en/rewrite/intro.html#rewriterule contains:

The Pattern is always a regular expression matched against the URL-Path of the incoming request (the part after the hostname but before any question mark indicating the beginning of a query string).

This is the same problem as we had in RewriteRule's documentation before the previous commit. I suggest:

The Pattern is a regular expression. It is initially matched against the URL-path of the incoming request (the part after the hostname but before any question mark indicating the beginning of a query string) or, in per-directory context, against the request's path relative to the directory for which the rule is defined.

But that could probably be clearer.


And another error, of syntax this time, about Substitution. We have:

file-system path
    Designates the location on the file-system of the resource to be delivered to the client. Substitutions are only treated as a file-system path when the rule is configured in server (virtualhost) context and the first component of the path in the substitution is exists in the file-system

Something's wrong in the last sentence. I guess "is exists" should read "exists".


There is also a problem in the part mentioning RewriteBase, but that one is more delicate, I shall remember to open a new report about it.
Comment 12 Daniel Gruno 2012-04-13 09:37:21 UTC
Whether or not the actual file exists does not, at the point of mod_rewrite's hooking into the process, matter where matching is concerned, so I think your first argument is moot. In terms of initially matching against a request, a file path is still valid even though the file itself does not exist. It is only when you either start substituting or when the file handler kicks in, that it matters whether the path translates to an actual file.

As for the rest, I have made the necessary adjustments to the two documents.
Comment 13 Daniel Gruno 2012-04-13 14:16:21 UTC
I'm going to close this ticket now, as we have already moved way beyond the scope of the original complaint. If you still have issues with the rewrite documents, you are welcome to open a new ticket, but I must ask that you do so for each specific subject that comes to mind, so it won't end up like a never-ending ticket like this one.

Having each specific subject in a separate thread would also greatly ease the discussion, as we know to what we are replying, and within which scope we're discussing the issue.

You are, of course, also very welcome to submit a patch for the documents you want changed, so we won't have to muck about ourselves when a few wordings need change here and there.
Comment 14 Philippe Cloutier 2012-04-13 17:16:29 UTC
Thank you once more for your changes, Daniel.

I opened a new request about a minor problem left in the introduction (bug 53080).

But I still think that the second parenthesis in "On the first RewriteRule, it is matched against the (%-decoded) URL-path (or file-path, depending on the context) of the request." is wrong.

I forgot to mention, just below we expand on that:
In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that led the server to the current RewriteRule (e.g. "app1/index.html" or "index.html" depending on where the directives are defined).

I find this less bad, although one may ask *which* filesystem path this refers to.

Let me try to put the issue differently: What is the file-path of a request?
Comment 15 Philippe Cloutier 2012-04-26 16:50:19 UTC
I reported the "file-path" problem in #53153. I also opened #53152 about the part which mentions RewriteBase.