Issue Details (XML | Word | Printable)

Key: MODPYTHON-146
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Graham Dumpleton
Reporter: Graham Dumpleton
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
mod_python

ap_internal_fast_redirect() and request object cache

Created: 13/Mar/06 07:33 PM   Updated: 01/Apr/06 02:06 PM
Return to search
Component/s: core
Affects Version/s: 3.2.8
Fix Version/s: None

Time Tracking:
Not Specified


 Description  « Hide
mod_python uses a Python class to wrap the Apache request_rec structure. The primary purpose of the request object wrapper is to access the request_rec internals. One of the other features of the request object wrapper is that handlers can add their own attributes to it, to facilitate communication of information between handlers. This communication of information between handlers works because a handler will lookup to see if a request object has already been created for the request as a whole before creating a fresh request object wrapper, and will use the existing one instead.

All in all this generally works okay, however, the DirectoryIndex directive and the ap_internal_fast_redirect() do cause undesirable behaviour in specific cases.

Now when a request is made against a directory, this is detected by mod_dir, which in a LAST hooked handler in the fixup handler phase, will use ap_internal_fast_redirect() to determine if any of the files mentioned in the DirectoryIndex directive exist. What this function does is run through all request phases up to and including the fixup handler phase for the file which would be matched for the entry in DirectoryIndex. If the status comes back OK indicating the request could be satisfied, it copies all the information from the sub request request_rec structure into the parent request_rec structure. It will then proceed with this information to execute the content handler phase.

The problem is that ap_internal_fast_redirect() knows only about the request_rec structure and nothing about the Python request object wrapper. As a consequence, the request object created for the sub request which worked and ran through to the fixup handler phase is being ignored and that originally created for the parent request continues to be used. As a consequence, any of the attributes added by handler phases up to and including the fixup handler are lost.

What possibly needs to be done is that the get_request_object() function in mod_python needs to add to req->notes a tag which identifies the instance of the request object which has been created. Because req->notes will be overlayed by the notes table contents from the sub request, it will be able to detect when this copy of sub request data into the parent has occured. It can then decide to switch to the request object created for the sub request, updating the request_rec member to point to the parent request_rec instead.

What may also come out of of storing an id for a request object in the req->notes table is that when an internal redirect occurs, instead of a fresh request object wrapper instance being created to use for the req.main attribute, it can use the id in req->notes to actually get hold of the real request object of the parent and chain to it properly. Doing this will mean then that a sub request will be able to access attributes added into a request object of the parent request, something which can't currently be done.

Now, if you understand everything I have said above, you have done well. ;-)

Depending on whether people do understand or not, when I get a chance I'll try and attach some examples of handlers which demonstrate he problem.

Acknowledgements that you understand the issue appreciated.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Graham Dumpleton added a comment - 15/Mar/06 05:30 PM
The idea of trying to use req->notes to store an ID for the request object to be used will in practice not be able to be used, or not easily. This is because ap_internal_fast_redirect() wrongly merges the contents of the notes table from main request and subrequest. This issue has been brought up on main Apache developers mailing list:

  http://www.mail-archive.com/dev%40httpd.apache.org/msg31263.html

The responses indicate it is a bug, but indications are that it possibly will not be truly fixed until Apache 2.4 by using a proper internal redirect rather than a fast redirect.

  http://www.mail-archive.com/dev%40httpd.apache.org/msg31264.html
  http://www.mail-archive.com/dev%40httpd.apache.org/msg31265.html
  http://www.mail-archive.com/dev%40httpd.apache.org/msg31266.html

Even if mod_python tried to implement a solution for use of the wrong request object, the way that req.notes is wrongly merged from sub request to main request means that data in that table is going to be stuffed up and possibly not useable. As a consequence, there doesn't seem to be much point in trying to fix this problem.

The only workaround for this whole issue at this point is for a fixup handler to be implemented by a user which hijacks the DirectoryIndex mapping mechanism and which determines what file to redirect to and execute an internal redirect instead. For example:

  import os
  from mod_python import apache

  def fixuphandler(req):
      if os.path.isdir(req.filename):
          uri = req.uri + 'index.html'
          if req.args: uri += '?' + req.args
          req.internal_redirect(uri)
          return apache.DONE
      return apache.OK

This should really just use request_rec->finfo->filetype, but mod_python doesn't seem to expose the file type attribute through the request object. Thus, necessary to perform a redundant check of whether target is a directory.

Alternatively, could use:

  def fixuphandler(req):
      if req.content_type == 'httpd/unix-directory':
          uri = req.uri + 'index.html'
          if req.args: uri += '?' + req.args
          req.internal_redirect(uri)
          return apache.DONE
      return apache.OK

as mod_mime seems to set content type to this magic value when request_rec->finfo->filetype is APR_DIR, ie., a directory. Should mod_python provide this magic content type as apache.DIR_MAGIC_TYPE?

Note that these examples assume that index file exists, but handler could just as easily iterate over a list of possible targets until one is found. Would get even more complicated if one had to consider different choices based on language.

Repository Revision Date User Message
ASF #388487 Fri Mar 24 11:06:08 UTC 2006 grahamd Moved where python_filter() accesses per request Python config to after the
request config object is acquired. Needed as way ap_internal_fast_redirect()
is used by Apache to implement DirectoryIndex was creating scenario where
per request config was null in parent request when accessed, causing a crash.
(MODPYTHON-103) (MODPYTHON-146)
Files Changed
MODIFY /httpd/mod_python/trunk/src/mod_python.c
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/__init__.py
MODIFY /httpd/mod_python/trunk/src/include/mpversion.h

Repository Revision Date User Message
ASF #388492 Fri Mar 24 11:26:05 UTC 2006 grahamd When python_filter couldn't find an actual mod_python filter handler it was
returning DECLINED, but not releasing the interpreter. By rights this
scenario should never have happened, but issues caused by use of the
ap_internal_fast_redirect() function by Apache to implement DirectoryIndex
directive was causing it, as details of filters weren't being copied from
sub request object to parent. End result was further requests against that
interpreter in that process would hang as lock for interpreter couldn't be
acquired. Would also cause shutdown to hang with child processes needing to
be killed off forcibly by parent Apache process. (MODPYTHON-146)
Files Changed
MODIFY /httpd/mod_python/trunk/src/mod_python.c

Graham Dumpleton added a comment - 24/Mar/06 07:00 PM
In addition to the sub request Python request object being ignored, the ap_internal_fast_redirect() also doesn't transfer anything held in the sub request req->request_config attribute. As well as holding the Python request object, this holds information about dynamically registered handlers and input/output filters. Thus the specifics of handlers and input/output filters registered by the sub request are lost.

Graham Dumpleton added a comment - 24/Mar/06 07:39 PM
Correction to previous recipe for explicitly using an internal redirect to get around the problems as described above. The code should only perform the redirect when req.uri already ends with a slash, otherwise it will override Apache's inbuilt mechanism for adding the trailing slash by send a redirect back to the client.

Using features now available because of other changes, should say:

def fixuphandler(req):
    if req.finfo[apache.FINFO_FILETYPE] == apache.APR_DIR:
        if req.uri[-1] == '/':
            uri = req.uri + 'index.html'
            if req.args: uri += '?' + req.args
            req.internal_redirect(uri)
            return apache.DONE
    return apache.OK

Graham Dumpleton added a comment - 01/Apr/06 02:06 PM
A more through analysis of the DirectoryIndex problems was posted on the mod_python developers mailing list:

  http://www.mail-archive.com/python-dev@httpd.apache.org/msg01736.html