Description
In mod_python 3.3 a new function is available when the new module importer is
used called apache.get_handler_root(). The purpose of the function is to return
the directory specified by the Directory directive in which the current
Python*Handler was defined within. In the case of DirectoryMatch being used or
Directory with ~ match, the value returned will always have any wildcards or
regular expressions expanded and will show the true physical directory matched
by Apache for the request.
This function is effectively a wrapper around the value of req.hlist.directory,
but is actually a bit more complicated than that. The reason there is a bit
more to it than that, is that the function is actually callable while modules
are being imported, ie., outside of the context of the actual request handler.
It is able to be called in this way, as the new importer sets up a per thread
cache where it stashes the information for access for the life of the request.
Further complications arise where req.add_handler() is used and no handler path
is supplied as last argument to this function. In that case req.hlist.directory
is None, but the handler path associated with the context in which
req.add_handler() was called can be determined by tracking back through
req.hlist.parent until the directory attribute is specified. To avoid a user
doing this, the value that apache.get_handler_root() returns has already had
that done where necessary.
The reason for making the handler root available when modules are being
imported, as it then makes it a lot easier for web applications to use the
directory that Python*Handler directive was defined for as an anchor point for
the application code, with access to further module imports or config files
being made in respect of this directive dynamically rather than have to hard
code paths in the Apache configuration using PythonOption. In using this
though, one does have to be careful that modules aren't shared between two
handler roots by using PythonInterpreter to separate two distinct web
applications when necessary.
This is all well and good if the Directory/DirectoryMatch directives are used,
but useless if the Location/LocationMatch directives are used. Where these are
currently used, apache.get_handler_root() and req.hlist.directory yield '/'. I
think originally I had the code returning an empty string, but when support for
expansion of wildcards was added and path normalisation done, the '/' was
getting returned instead.
For starters, instead of '/' the None value should be the result where
Location/LocationMatch directives are used. Second, there should really be an
equivalent to req.hlist.location which yields the leading part or the URL which
corresponds to the directory stored in req.hlist.directory. In effect this is
yielding an absolute base URL and would mean that it would no longer be
necessary to perform calculations like described in:
http://www.modpython.org/pipermail/mod_python/2006-March/020501.html
for calculating handler base URLs where Directory/DirectoryMatch is used,
something that most people seem to get wrong from what I have seen.
An important thing about that code is that it only works for when
Directory/DirectoryMatch is used. There is actually no way (at least that I
know of), for actually determining what the expanded path corresponding to a
Location/LocationMatch directive is. This is a major grumbling point for
packages like Trac, MoinMoin, Django and TurboGears, as it means that they have
to require the user to manually duplicate the path to the directive in a
PythonOption or using some other configuration mechanism so that the package
knows where its root URL is.
Thus, if req.hlist.location can be supplied, this would solve this problem. In
respect of apache.get_handler_root(), am not sure there really should be an
equivalent within the apache module as knowing the location at the time of
import sounds a bit dubious to me even if it might be useful if a package
performs configuration at time of import. It would be much more sensible for a
package to use the req.hlist.location value at the time of each request. One
option is to add a req.base_uri attribute or req.get_base_uri() method to the
request object. This would take into consideration the need to recurse back
through parent handler contexts where req.add_handler() is used, like with
req.hlist.directory.
In summary:
1. Change code so req.hlist.directory is None where Location/LocationMatch
directive is used.
2. Add req.hlist.location which gives the base URL, ie., leading path of URL,
which equates to the directory specified by req.hlist.directory where the
directory has come from the Apache configuration.
3. Look at adding a new method or attribute to request object which provides
the base URL with value being inherited from parent handler contexts where
appropriate. Would need to select an appropriate name for this.
I think this is important enough to sneak it into mod_python 3.3, then we can
silence those other packages who grumble that it can't be determined.
When this is implemented, code in Session class can be changed from:
dirpath = self._req.hlist.directory
if dirpath:
docroot = self._req.document_root()
c.path = dirpath[len(docroot):]
else:
c.path = '/'
if not c.path or not self._req.uri.startswith(c.path):
c.path = '/'
to something that uses req.hlist.location instead. It will need to traverse through parent contexts if necessary to find point that req.hlist.location is not None.
There is a small chance that making this change will cause problems with existing setups which are relying on the default being '/' when Location directive is set.
Comment in above code suggests need to make sure the original change proposed works correctly when UserDir or Alias comes into play.