Issue Details (XML | Word | Printable)

Key: MODPYTHON-115
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Graham Dumpleton
Reporter: Graham Dumpleton
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
mod_python

import_module() and multiple modules of same name.

Created: 27/Jan/06 02:24 PM   Updated: 05/Apr/07 11:38 AM
Return to search
Component/s: core
Affects Version/s: 3.1.4, 3.2.7
Fix Version/s: 3.3.1

Time Tracking:
Not Specified

Issue Links:
dependent
 

Resolution Date: 12/Aug/06 08:17 AM


 Description  « Hide
The "apache.import_module()" function is a thin wrapper over the standard Python module importing system. This means that modules are still stored in "sys.modules". As modules in "sys.modules" are keyed by their module name, this in turn means that there can only be one active instance of a module for a specific name.

The "import_module()" function tries to work around this by checking the path name of the location of a module against that being requested and if it is different will reload the correct module. This check of the path though only occurs when the "path" argument is actually supplied to the "import_module()" function. The "path" is only supplied in this way when mod_python.publisher makes use of the "import_module()" function, it is not supplied when the "Python*Handler" directives are used because in that circumstance a module may actually be a system module and supplying "path" would prevent it from being found.

Even though mod_python.publisher supplies the "path" argument to the "import_module()" function, the check of the path has bugs, with modules possibly becoming inaccessible as documented in JIRA as MODPYTHON-9.

The check by mod_python of the path name to the actual code file for a module to determine if it should be reloaded, can also cause a continual cycle of module reloading even though the modules on disk may not have changed. This will occur when successive requests alternate between URLs related to the distinct modules having the same name. This cyclic reloading is documented in JIRA as MODPYTHON-10.

That a module is reloaded into the same object space as the existing module when two modules of the same name are in different locations, can also cause namespace pollution and security issues if one location for the module was public and the other private. This cross contamination of modules is as documented in JIRA as MODPYTHON-11.

In respect of the "Python*Handler" directives where the "path" argument was never supplied to the "import_module()" function, the result would be that the first module loaded under the specified name would be used. Thus, any subsequent module of the same name referred to by a "Python*Handler" directive found in a different directory but within the same interpreter would in effect be ignored.

A caveat to this though is that such a "Python*Handler" directive would result in that handlers directory being inserted at the head of "sys.path". If the first instance of the module loaded under that name were at some point modified, the module would be automatically reloaded, but it would load the version from the different directory.

Now, although these problem as they relate to mod_python.publisher are addressed in mod_python 3.2.6, the underlying problems in 'import_module()' are not. As the bug reports as they relate to mod_python.publisher have been closed off as resolved, am creating this bug report so as to carry on a bug report for the underlying problem as it applies to "Python*Handler" directive and use of "import_module()" explicitly.

To illustrate the issue as it applies to "Python*Handler" directive, create two separate directories with a .htaccess file containing:

  AddHandler mod_python .py
  PythonHandler index
  PythonDebug On

In the "index.py" file in each separate directory put:

  import os
  from mod_python import apache

  def handler(req):
    req.content_type = 'text/plain'
    print >> req, os.getpid(), __file__
    return apache.OK

Assuming these are accessed as:

  /~grahamd/mod_python_9/subdir-1/index.py
  /~grahamd/mod_python_9/subdir-2/index.py

access the first URL, and the result will be:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

now access the second URL and we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

Note this assumes the same child process got it, so fixing Apache to run one child process is required for this test.

As one can see, it doesn't actually use the 'subdir-2/index.py" module at all and still uses the "subdir-1/index.py' module.

If one modifies "subdir-1/index.py' so its timestamp is updated and load the second URL again, we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-2/index.py

This occurs because it detects the change in the first module loaded, but because sys.path had the second handler directory at the head of sys.path now, when reloaded it picked up the latter.

These issues with same name module in multiple locations is listed as ISSUE 14 in my list of module importer problems. See:

  http://www.dscpl.com.au/articles/modpython-003.html


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Graham Dumpleton added a comment - 10/Mar/06 01:40 PM
Linked issues which will be addressed by rewritten module importer and top level handler dispatcher.

Graham Dumpleton made changes - 10/Mar/06 01:40 PM
Field Original Value New Value
Link This issue depends upon MODPYTHON-143 [ MODPYTHON-143 ]
Graham Dumpleton made changes - 01/Apr/06 01:37 PM
Assignee Graham Dumpleton [ grahamd ]
Graham Dumpleton made changes - 01/Apr/06 01:57 PM
Status Open [ 1 ] In Progress [ 3 ]
Graham Dumpleton added a comment - 12/Aug/06 08:17 AM
Resolved by new module importer for 3.3, but for 3.3 release looks like the new importer will have to be enabled explicitly as will not be the default.

In the new module importer, modules are not stored in sys.module and instead are stored in a separate caching system where they are distinguished by the full pathname to the module file.

Note though that the new module importer doesn't work with Python packages. Python packages must be in a directory appearing on sys.path and when needed will be imported by standard Python import mechanism. Any Python packages will not be candidates for automatic reloading and because they will still be stored in sys.modules, must have a unique top level name for the package.

Graham Dumpleton made changes - 12/Aug/06 08:17 AM
Status In Progress [ 3 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Fix Version/s 3.3 [ 12310101 ]
Graham Dumpleton made changes - 05/Apr/07 11:38 AM
Status Resolved [ 5 ] Closed [ 6 ]