Bug 52181 - mod_rewrite Technical Details Ruleset Processing diagram unclear about transition between rules
Summary: mod_rewrite Technical Details Ruleset Processing diagram unclear about transi...
Status: VERIFIED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: HTTP Server Documentation List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-14 18:48 UTC by Philippe Cloutier
Modified: 2012-04-12 21:40 UTC (History)
0 users



Attachments
Updated rewrite chart (83.01 KB, image/png)
2012-04-10 08:07 UTC, Daniel Gruno
Details
Updated (again) rewrite chart (84.36 KB, image/png)
2012-04-11 18:51 UTC, Daniel Gruno
Details
Updated rewrite chart, once more (104.30 KB, image/png)
2012-04-12 07:30 UTC, Daniel Gruno
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Cloutier 2011-11-14 18:48:28 UTC
The diagram in http://httpd.apache.org/docs/trunk/en/rewrite/tech.html#InternalRuleset is fairly clear on what happens for a rule, but unclear about what happens after a rule finishes. That part has a "Next" process that links to the "RewriteRule" process with a dotted transition labeled "new URL".
Nothing explains what a dotted transition means. What the Next activity does is unclear.
I imagine adding a "Last rule?" decision would clarify.
Note that a rule's substitution is not necessarily a URL.
Comment 1 Daniel Gruno 2012-04-02 12:14:18 UTC
I'm unsure what you mean by "a rule's substitution is not necessarily a URL" in terms of contradictions in the flow chart. I can however relate to the fact that it seems like an endless loop in the figure, and some final "back to httpd" should come into play.

Now, I'm not the more awesome doodle artist, but I did draw up a suggestion for what it could say instead: http://www.humbedooh.com/apache/rewrite_process.png
The notable difference being I replaced "Next" with "More rules?" - if so, go back to RewriteRule, it not, go back to httpd. Is that the sort of clarification you were seeking?
Comment 2 Philippe Cloutier 2012-04-09 22:00:30 UTC
Hi Daniel,
this is precisely the clarification I was seeking. Thank you

I see some issues left in the proposition:

There is still a dotted transition. Please either define what a dotted transition means or do not use it.

The current diagram is clearer (or better, anyway) at distinguishing activities from decisions, which are represented as diamonds. The proposition does not "visually" distinguish activities from decisions. This distinction would be nice to keep (see http://en.wikipedia.org/wiki/Activity_diagram for a possible convention to use).

The proposition still suggests infinite looping. For example, suppose we have a URI and one non-matching rule. We start at URI, go to RewriteRule, to Check pattern, to More rules?, to File or URI, and finally we're back where we started, at URI. I guess "More rules?" should be visited before "RewriteRule".

The proposition is a little less clear about where the activity starts (since there is no more node with only an outgoing arrow).
Comment 3 Daniel Gruno 2012-04-10 08:07:31 UTC
Created attachment 28568 [details]
Updated rewrite chart

I see what you mean, and I have updated the chart to reflect those proposed changes. Please let me know if this is more along the lines of what you had in mind, and I'll neaten it up and perhaps commit the changes sometime soon.
Comment 4 Philippe Cloutier 2012-04-10 17:25:32 UTC
Thank you Daniel, this does address most of my remaining concerns with the previous proposition.

There are 2 issues of the current diagram which are not completely addressed:

1. It is assumed that there is at least one rewrite rule. This could be fixed by changing the destination of the arrow leaving "Apache receives URI", from RewriteRule to "More rules?". The label for the transition from "More rules?" to RewriteRule ("Yes (pass on new URI)") may not fit though, if there are no rules.


2. While there is now clearly a way for the activity to end (reaching "Serve the file"), I don't think "Filename" is in reality the only way to reach the end. For example, suppose we have one non-matching rule again. We start at The request, go to Apache receives URI, to RewriteRule, Check pattern, More rules?. Since we have no more rules, we go to "File or URI?" still with our original URI. Then the diagram brings us back to "Apache receives URI", and we're stuck in a loop.

According to the API phases section:
If a substitution is made in per-directory context, a new internal subrequest is issued with the new URL, which restarts processing of the request phases.

So what should probably happen after leaving "File or URI?" with a URI is another decision. If the URI changed, we do go back to "Apache receives URI". If not, we serve the file.
Comment 5 Daniel Gruno 2012-04-11 18:51:18 UTC
Created attachment 28586 [details]
Updated (again) rewrite chart

I have made another attempt at capturing what you mean. I have added some conditions and made the "File or URI" segment into a three-way solution for serving both a file-path, a changed URI or an external redirect/proxy request.

Awaiting response and so forth.
Comment 6 André Malo 2012-04-11 19:06:55 UTC
(In reply to comment #5)
> Created attachment 28586 [details]
> Updated (again) rewrite chart
> 
> I have made another attempt at capturing what you mean. I have added some
> conditions and made the "File or URI" segment into a three-way solution for
> serving both a file-path, a changed URI or an external redirect/proxy request.
> 
> Awaiting response and so forth.


It looks a little confusing, but that's how mod_rewrite is anyway ;)

I think, a red RewriteCond (or multiple boxes) is missing below the RewriteCond decider.

More comments:

- I'd split it into two graphs; one for per-directory (which happens during the current "File-path" arrow). and this one for server context. Side-note: it's not a subrequest issued in directory context, but an internal redirect.

- Maybe we should split even more; like showing the API phases where the rewrites can happen (directry vs. server). This may simplify the other graphs, since they can focus on the ruleset.

- <More Rules> can be skipped with the [L] flag (and PT, too, but only in server context)

- The [N] flag restarts the complete ruleset with the current file-or-uri

- on a completely different topic: is it possible to commit the graph original , too (not only the resulting PNG)?

nice work!
Comment 7 Daniel Gruno 2012-04-11 19:25:46 UTC
Whoa now, let's not get _that_ complicated with a chart meant to clarify, not further complicate things ;). I appreciate the comments, but it seems to be a bit much for my brain to handle at the moment.

So, I'll leave my little sketch as is, and we can discuss whether it's at least an "upgrade" of sorts to the current diagram.

As for the "raw" imagery, you can find it at http://www.humbedooh.com/apache/ in both pptx (powerpoint) and odp (openoffice impress) formats. If you feel you can improve upon it, or split it up, I'd very much like to see what you can come up with :)
Comment 8 Philippe Cloutier 2012-04-11 20:55:40 UTC
Thank you again Daniel.

The addition of "RewriteRules?" does solve point 1.

As for "Redirect or proxy the contents", I think this does improve the accuracy of the overall process, but I do not think this really solves my issue 2. Here's a test again. 

Let's say we have a request for "http://apache.org/foundation/" and the only rule in the ruleset doesn't match. We still start at "The request", go to "Apache receives URI", to "RewriteRules?", to RewriteRule, to "Check pattern", to "More rules?" and then to "File or URI?". What happens then? First of all, notice that the label "File or URI?" would suggest a boolean choice, which is no longer the case. Anyway, we certainly don't have a file, so we either follow External URL or Internal URI. I am not sure what the exact difference between these is, but I suppose we should follow "Internal URI" in this case. If so, we land on "Apache receives URI", and again, we appear to be in an infinite loop.
Comment 9 Philippe Cloutier 2012-04-11 21:06:47 UTC
I tend to agree with André. The more I read about mod_rewrite, the more I think it's really complicated. It's no wonder why the documentation has several issues, this is just *difficult* to document well.

I don't know what the ideal situation should be, but having several diagrams is probably the way to go.

For the short term, I would suggest to address this by limiting the scope of the graph. Probably, we start with some string (presumably a URI), and we end with some string. But we don't get into showing what happens with that final string. Daniel's proposal does that part well.
Comment 10 Philippe Cloutier 2012-04-11 21:18:45 UTC
I said "we start with some string (presumably a URI)", but in fact, the parenthesis is wrong, as explained in http://httpd.apache.org/docs/trunk/en/mod/mod_rewrite.html#rewriterule

In Directory and htaccess context, the Pattern will initially be matched against the filesystem path, after removing the prefix that lead the server to the current RewriteRule (e.g. "app1/index.html" or "index.html" depending on where the directives are defined).
Comment 11 Daniel Gruno 2012-04-12 07:30:15 UTC
Created attachment 28589 [details]
Updated rewrite chart, once more

I gave it another shot. Save all the debacle about URL/file-path, can we at least agree that in a URI context (Apache defines this as an URI, whether or not the http/https scheme is provided) this new chart is somewhat accurate in describing what happens?
Comment 12 Philippe Cloutier 2012-04-12 20:53:23 UTC
This addresses my issue and, for what I know of mod_rewrite, has no regression. It may be correct when we only consider URI-s.

I think this last diagram should be adopted. Thank you very much Daniel
Comment 13 Daniel Gruno 2012-04-12 21:36:57 UTC
I have committed the changes to the trunk now, and I'll consider this ticket closed (for the time being at least), since the original bug has been confirmed and resolved.

As for Andre's suggestions, I think this would best be solved in a new ticket or perhaps on the docs@ list, when we've all had some time to think it through. It is a complicated matter, and simply expanding the flow chart to include all flags and contexts would serve no other purpose than confusion, in my opinion. It's a chart that is meant to illustrate and simplify an example of a rewrite flow, not necessarily explain how the entire Apache server works ;)

So, I'm closing this ticket now, and if anyone finds my suggestion unreasonable, feel free to reopen it.