Uploaded image for project: 'mod_python'
  1. mod_python
  2. MODPYTHON-143

Implement and integrate a new module importer.

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.8
    • Fix Version/s: 3.3.1
    • Component/s: importer
    • Labels:
      None

      Description

      This is an overall task to cover the issue of rectifying the various module importer issues by replacing it with a new implementation. A description of the various problems can be found in:

      http://www.dscpl.com.au/articles/modpython-003.html

      Separate issues had already been created for some of the specific problems. These issues will now be linked to this problem and thus marked as being dependent on this issue.

      In other words, replacing the module importer will solve a number of number issues. Rather than try and keep up to date all the separate issues, all information about the replacement will be put against this issue instead.

      Note that there are also some issues which are not directly related to the module importer but which will be made dependent on this issue because it is easier to fix the issue as part of the rewrite of the module importer and top level handler dispatch mechanism than it is to address it as a distinct item.

      In respect of what impacts the new module importer implementation may have and how it is used may change, this will be documented in the following document for the time being:

      http://www.dscpl.com.au/articles/modpython-007.html

      Note that this document is a work in progress. It is dense reading and assumes you know a bit about the current module importer and its problems. Any significant issues raised by this document can be added here as a comment, or if a general dicussion of a topic is needed, raise the issue on the mod_python developers mailing list.

      A possible new implementation for the module importer is basically ready for testing and experimentation. The intent is to push it into the mod_python source tree, but for its use to be optional.

      If wanting to enable it for a specific Python interpreter, the PythonImport directive would be used:

      PythonImport mod_python.future.importer mytestinterpreter

      If wanting to enable it for all Python interpreters, a PythonOption directive would be used at global scope within the Apache configuration. Ie., outside of all Location, Directory or Files container directives. The exact option name to be used hasn't yet been decided.

      More details and announcements at the appropriate time.

        Issue Links

          Activity

          Hide
          grahamd Graham Dumpleton added a comment -

          Linked issues which will be addressed by rewritten module importer and top level handler dispatcher.

          Show
          grahamd Graham Dumpleton added a comment - Linked issues which will be addressed by rewritten module importer and top level handler dispatcher.
          Hide
          jgallacher Jim Gallacher added a comment -

          Graham's new importer uses the new python import hooks described in http://www.python.org/doc/peps/pep-0302. These hooks were introduced in python 2.3a1. Thus the new importer would mean offically dropping support for python 2.2. I don't have a problem with this, but I want make sure we don't loose sight of this fact.

          Show
          jgallacher Jim Gallacher added a comment - Graham's new importer uses the new python import hooks described in http://www.python.org/doc/peps/pep-0302 . These hooks were introduced in python 2.3a1. Thus the new importer would mean offically dropping support for python 2.2. I don't have a problem with this, but I want make sure we don't loose sight of this fact.
          Hide
          grahamd Graham Dumpleton added a comment -

          For technical reasons related to ensuring new importer got to take over from old at the right time, original plan of using PythonImport to enable new importer had to be abandoned. Instead, new importer would be enabled by doing one of the following.

          To enable use of new importer for all Python interpreter instances, in other words for everything, will be necessary to specify:

          PythonOption mod_python.future.importer *

          To enable use of the new importer for selected Python interpreter instances, instead of "*" a comma separated list of interpreter names can be specified. Thus, to enable for a single interpreter called "testing-1", use:

          PythonOption mod_python.future.importer testing-1

          Or for both "testing-1" and "testing-2" interpreters, use:

          PythonOption mod_python.future.importer testing-1,testing-2

          In all cases, the PythonOption must be in the main Apache configuration files outside of any VirtualHost, Directory, Files or Location directives.

          Show
          grahamd Graham Dumpleton added a comment - For technical reasons related to ensuring new importer got to take over from old at the right time, original plan of using PythonImport to enable new importer had to be abandoned. Instead, new importer would be enabled by doing one of the following. To enable use of new importer for all Python interpreter instances, in other words for everything, will be necessary to specify: PythonOption mod_python.future.importer * To enable use of the new importer for selected Python interpreter instances, instead of "*" a comma separated list of interpreter names can be specified. Thus, to enable for a single interpreter called "testing-1", use: PythonOption mod_python.future.importer testing-1 Or for both "testing-1" and "testing-2" interpreters, use: PythonOption mod_python.future.importer testing-1,testing-2 In all cases, the PythonOption must be in the main Apache configuration files outside of any VirtualHost, Directory, Files or Location directives.
          Hide
          grahamd Graham Dumpleton added a comment -

          New importer commited.

          Note that if you wish to run the test suite using the new importer, you need to edit "test/test.py" and uncomment the like:

          PythonOption('mod_python.future.importer *'),

          in the makeConfig() function.

          Time to get back to documenting what this new importer is all about.

          Show
          grahamd Graham Dumpleton added a comment - New importer commited. Note that if you wish to run the test suite using the new importer, you need to edit "test/test.py" and uncomment the like: PythonOption('mod_python.future.importer *'), in the makeConfig() function. Time to get back to documenting what this new importer is all about.
          Hide
          grahamd Graham Dumpleton added a comment -

          Going to be a pain and change the name of the PythonOption used to enable the new importer. The reason for this is that I am coming to the belief that making the new importer the default in mod_python 3.3 is probably not a good idea. Instead, mod_python 3.3 when released should allow both to exist with the default being the old importer. When the user is ready, they can make a conscious decision to turn on the new importer and then modify their code if necessary to ensure it works.

          The thinking here is that even though we may be able to produce some really good documentation that says how the new importer works, many aren't going to bother to check the documentation and ensure their code works first. Some may not even have a choice given that it may be dictated by an ISP who makes the decision to install the new version. Also, you have projects like Django which rely on mod_python working in a certain way and there will probably be a lot of complaints if we dump out a new version of mod_python which breaks it in some way. Depending on their release schedules they may not be in a position to release an updated version quickly. Thus we need to allow for a transition period where both options exist in an official release version, we can't expect people to use the version out of subversion, or even final test tar balls.

          As to turning on the new importer as the default, I would suggest that after mod_python 3.3 is out and bedded down, then we make a swap as to which is the default importer and release it as a new major version 4.0 so as to clearly signify that there are sufficient differences in how parts of the importer work. The old importer could be removed from the code in some future version after that.

          This new major version could also be a good point to drop default support for old option names as talked about in MODPYTHON-127.

          That all said, the new PythonOption I am going to use is:

          mod_python.use_new_importer

          The argument to the option would be as before. That is "*" for all interpreters, or a comma separated list of interpreters. The option will still need to be defined at global scope in main Apache configuration outside of all configuration containers.

          On this basis I will now make some of the changes to the C code bits of mod_python which need to be made so that new importer works properly in all cases. Such code though will only be run when the above option lists the interpreter the request is executing in. I'll also have to make the test harness suite be able to cope with running extra tests only for the new module importer, so how new importer is enabled in the test suite will change as well.

          Show
          grahamd Graham Dumpleton added a comment - Going to be a pain and change the name of the PythonOption used to enable the new importer. The reason for this is that I am coming to the belief that making the new importer the default in mod_python 3.3 is probably not a good idea. Instead, mod_python 3.3 when released should allow both to exist with the default being the old importer. When the user is ready, they can make a conscious decision to turn on the new importer and then modify their code if necessary to ensure it works. The thinking here is that even though we may be able to produce some really good documentation that says how the new importer works, many aren't going to bother to check the documentation and ensure their code works first. Some may not even have a choice given that it may be dictated by an ISP who makes the decision to install the new version. Also, you have projects like Django which rely on mod_python working in a certain way and there will probably be a lot of complaints if we dump out a new version of mod_python which breaks it in some way. Depending on their release schedules they may not be in a position to release an updated version quickly. Thus we need to allow for a transition period where both options exist in an official release version, we can't expect people to use the version out of subversion, or even final test tar balls. As to turning on the new importer as the default, I would suggest that after mod_python 3.3 is out and bedded down, then we make a swap as to which is the default importer and release it as a new major version 4.0 so as to clearly signify that there are sufficient differences in how parts of the importer work. The old importer could be removed from the code in some future version after that. This new major version could also be a good point to drop default support for old option names as talked about in MODPYTHON-127 . That all said, the new PythonOption I am going to use is: mod_python.use_new_importer The argument to the option would be as before. That is "*" for all interpreters, or a comma separated list of interpreters. The option will still need to be defined at global scope in main Apache configuration outside of all configuration containers. On this basis I will now make some of the changes to the C code bits of mod_python which need to be made so that new importer works properly in all cases. Such code though will only be run when the above option lists the interpreter the request is executing in. I'll also have to make the test harness suite be able to cope with running extra tests only for the new module importer, so how new importer is enabled in the test suite will change as well.
          Hide
          grahamd Graham Dumpleton added a comment -

          As per previous comments in:

          http://www.modpython.org/pipermail/mod_python/2006-May/021095.html

          have renamed special module variables put into all modules loaded by the new module importer.

          Name changes were:

          _info_ --> _mp_info_
          _clone_ -> _mp_clone_
          _purge_ -> _mp_purge_

          In addition, instead of appending directories to _info.path to specify additional directories to search for modules when using the importer, the module variable __mp_path_ should now be used instead.

          Show
          grahamd Graham Dumpleton added a comment - As per previous comments in: http://www.modpython.org/pipermail/mod_python/2006-May/021095.html have renamed special module variables put into all modules loaded by the new module importer. Name changes were: _ info _ --> _ mp_info _ _ clone _ -> _ mp_clone _ _ purge _ -> _ mp_purge _ In addition, instead of appending directories to _ info .path to specify additional directories to search for modules when using the importer, the module variable __mp_path _ should now be used instead.
          Hide
          grahamd Graham Dumpleton added a comment -

          This issue is basically complete and is just lacking some documentation.

          The name of the PythonOption setting was never changed and is still mod_python.future.importer.

          As stated before, to enable use of new importer for all Python interpreter instances, in other words for everything, will be necessary to specify:

          PythonOption mod_python.future.importer *

          To enable use of the new importer for selected Python interpreter instances, instead of "*" a comma separated list of interpreter names can be specified. Thus, to enable for a single interpreter called "testing-1", use:

          PythonOption mod_python.future.importer testing-1

          Or for both "testing-1" and "testing-2" interpreters, use:

          PythonOption mod_python.future.importer testing-1,testing-2

          In all cases, the PythonOption must be in the main Apache configuration files outside of any VirtualHost, Directory, Files or Location directives.

          It still looks like use of this new importer will be optional in mod_python 3.3 and enabled as default only in some subsequent version.

          Show
          grahamd Graham Dumpleton added a comment - This issue is basically complete and is just lacking some documentation. The name of the PythonOption setting was never changed and is still mod_python.future.importer. As stated before, to enable use of new importer for all Python interpreter instances, in other words for everything, will be necessary to specify: PythonOption mod_python.future.importer * To enable use of the new importer for selected Python interpreter instances, instead of "*" a comma separated list of interpreter names can be specified. Thus, to enable for a single interpreter called "testing-1", use: PythonOption mod_python.future.importer testing-1 Or for both "testing-1" and "testing-2" interpreters, use: PythonOption mod_python.future.importer testing-1,testing-2 In all cases, the PythonOption must be in the main Apache configuration files outside of any VirtualHost, Directory, Files or Location directives. It still looks like use of this new importer will be optional in mod_python 3.3 and enabled as default only in some subsequent version.
          Hide
          grahamd Graham Dumpleton added a comment -

          An issue which still needs to be looked at with new module importer is that for historical reasons, the _name_ attribute put in modules is concocted from an md5 hash of the full pathname of the module file. The reasons for this is that the implementation that the importer was based on attempted to still store modules in sys.modules and in doing so the module name couldn't contain various characters that can appear in pathnames, eg, slash, colon etc. This way of setting up the module name has persisted and not been changed.

          Problem is that it seems that use of certain third party packages can well and truly stuff up md5 generation in Python. See:

          http://www.modpython.org/pipermail/mod_python/2006-June/021482.html

          and all the followup posts.

          This may not be a big issue in as much as you probably do not want to try and be resilient to such a problem, as for md5 hashes it would be very important for any underlying problem to be fixed.

          Now the _name_ attribute in modules could possibly be replaced with the name of the file, ie., the same as _file, but could there be other code out there which makes assumptions about what sort of characters appear in the __name_ attribute and rely on that somehow.

          Show
          grahamd Graham Dumpleton added a comment - An issue which still needs to be looked at with new module importer is that for historical reasons, the _ name _ attribute put in modules is concocted from an md5 hash of the full pathname of the module file. The reasons for this is that the implementation that the importer was based on attempted to still store modules in sys.modules and in doing so the module name couldn't contain various characters that can appear in pathnames, eg, slash, colon etc. This way of setting up the module name has persisted and not been changed. Problem is that it seems that use of certain third party packages can well and truly stuff up md5 generation in Python. See: http://www.modpython.org/pipermail/mod_python/2006-June/021482.html and all the followup posts. This may not be a big issue in as much as you probably do not want to try and be resilient to such a problem, as for md5 hashes it would be very important for any underlying problem to be fixed. Now the _ name _ attribute in modules could possibly be replaced with the name of the file, ie., the same as _ file , but could there be other code out there which makes assumptions about what sort of characters appear in the __name _ attribute and rely on that somehow.
          Hide
          grahamd Graham Dumpleton added a comment -

          The new module importer implements a feature whereby if using apache.import_module() or the 'import' statement, that it will first look in the same directory as the code file exists for the target module. This brings it in line with how things normally work for Python modules outside of mod_python and should eliminate the hacks people have had to do in the past involving setting PythonPath directive to directories which are a part of the document tree exposed by mod_python.

          The only problem with making it look in the same directory first is that it changes the behaviour for code published using mod_python.publisher (and other cases, but this is most visible), where the published code was in some subdirectory of the root directory for which the PythonHandler directive was specified. Previously when the import was done, the expectation was (although the randomness of sys.path meant it may not have been the case in practice) that the root directory for where the handler directive was specified would be checked and then anywhere else on sys.path. That is, it wouldn't look in the same directory first when it was in a subdirectory.

          What this means is that if the subdirectory contained a sibling code file called 'random.py' and the first code file tried to import 'random', the 'random.py' file in the same directory would be ignored and instead it would go off any use the 'random' module distributed with Python instead. With the new importer though it will look in the same directory first, resulting in a greater risk that file local to the directory will hide and prevent the importing of a standard Python module.

          A mechanism is therefore need to resolve such clashes and state that any imports from a specific file should ignore some or perhaps all modules in the same directory when performing an import.

          The simplest way of doing this would be to allow a code file to specify a special global variable listing any module or file names from the current directory which should be ignored. For example something like:

          _mp_ignore_local_modules_ = ['random']

          import random

          Might need to come up with a better name for the global variable though.

          Show
          grahamd Graham Dumpleton added a comment - The new module importer implements a feature whereby if using apache.import_module() or the 'import' statement, that it will first look in the same directory as the code file exists for the target module. This brings it in line with how things normally work for Python modules outside of mod_python and should eliminate the hacks people have had to do in the past involving setting PythonPath directive to directories which are a part of the document tree exposed by mod_python. The only problem with making it look in the same directory first is that it changes the behaviour for code published using mod_python.publisher (and other cases, but this is most visible), where the published code was in some subdirectory of the root directory for which the PythonHandler directive was specified. Previously when the import was done, the expectation was (although the randomness of sys.path meant it may not have been the case in practice) that the root directory for where the handler directive was specified would be checked and then anywhere else on sys.path. That is, it wouldn't look in the same directory first when it was in a subdirectory. What this means is that if the subdirectory contained a sibling code file called 'random.py' and the first code file tried to import 'random', the 'random.py' file in the same directory would be ignored and instead it would go off any use the 'random' module distributed with Python instead. With the new importer though it will look in the same directory first, resulting in a greater risk that file local to the directory will hide and prevent the importing of a standard Python module. A mechanism is therefore need to resolve such clashes and state that any imports from a specific file should ignore some or perhaps all modules in the same directory when performing an import. The simplest way of doing this would be to allow a code file to specify a special global variable listing any module or file names from the current directory which should be ignored. For example something like: _ mp_ignore_local_modules _ = ['random'] import random Might need to come up with a better name for the global variable though.
          Hide
          grahamd Graham Dumpleton added a comment -

          Still need to update new module importer code so as to log exceptions which might occur in _mp_clone_ and _mp_purge_ hooks.

          Show
          grahamd Graham Dumpleton added a comment - Still need to update new module importer code so as to log exceptions which might occur in _ mp_clone _ and _ mp_purge _ hooks.
          Hide
          grahamd Graham Dumpleton added a comment -

          Basic documentation added for apache.import_module() and all outstanding code changes complete, so finally time to mark this as resolved. Time to roll out 3.3 and thence onward to 3.4/4.0 and beyond.

          Show
          grahamd Graham Dumpleton added a comment - Basic documentation added for apache.import_module() and all outstanding code changes complete, so finally time to mark this as resolved. Time to roll out 3.3 and thence onward to 3.4/4.0 and beyond.

            People

            • Assignee:
              grahamd Graham Dumpleton
              Reporter:
              grahamd Graham Dumpleton
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development