Issue Details (XML | Word | Printable)

Key: MODPYTHON-93
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Graham Dumpleton
Reporter: Jim Gallacher
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
mod_python

Improve util.FieldStorage efficiency

Created: 27/Nov/05 05:35 AM   Updated: 02/Apr/07 11:40 AM
Return to search
Component/s: core
Affects Version/s: 3.2.7
Fix Version/s: 3.3.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works modpython325_util_py_dict.patch 2005-12-02 03:47 PM Mike Looijmans 8 kB

Resolution Date: 22/Nov/06 11:18 AM


 Description  « Hide
Form fields are saved as a list in a FieldStorage class instance. The class implements a __getitem__ method to provide dict-like behaviour. This method iterates over the complete list for every call to __getitem__. Applications that need to access all the fields when processing the form will show O(n^2) behaviour where n == the number of form fields. This overhead could be avoided by creating a dict (to use as an index) when the FieldStorage instance is created.

Mike Looijmans has been investigating StringField and Field as well. It is probably reasonable to include information on his work in this issue as well, so that we can consider all of these efficiency issues in toto.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jim Gallacher added a comment - 27/Nov/05 06:36 AM
The FieldStorage methods get, getitems, has_key, __len__, getfirst and getlist *all* iterate over the complete list of fields each time they are called and so would all benefit from some kind of indexing scheme. Here is a bit of code that will give you some idea of what I have in mind.

class FieldStorage:

    def __init__():
        self.index = {} # holds references to the Field objects and can be used as an index.
        self.list = []

        ... blah blah blah ...
        ... create the field and get it's name ...

        self.list.append(field)
        if name in self.index:
            self.index[name].append(field)
        else:
            self.index[name] = [field,]

    def __getitem__(self, key):
        if key not in self.index:
            raise KeyError, key
       found = self.index[key]
       if len(found) == 1:
           return found[0]
       else:
           return found

The other FieldStorage methods would need to be refactored to take advantage of the index.

Mike Looijmans added a comment - 02/Dec/05 03:47 PM
What it does:
- Simplifies the creation of StringField objects. This was already
marked as a TODO in the 3.2.5b code.
- Does not create a file object (cStringIO) for each and every field.
- FieldStorage.get() will always return the same instance(s) for any
given name.
- FieldStorage.get() is very cheap now (does not create new objects,
which is expensive in Python)
- use a dictionary-on-demand.
- Lots of code removed for this one, and a few lines added.
- items() function, which, contrary to common belief, preserves argument ordering.

Mike Looijmans made changes - 02/Dec/05 03:47 PM
Field Original Value New Value
Attachment modpython325_util_py_dict.patch [ 12321091 ]
Nicolas Lehuen made changes - 29/Dec/05 04:59 AM
Fix Version/s 3.3 [ 12310101 ]
Affects Version/s 3.3 [ 12310101 ]
Affects Version/s 3.2 [ 11060 ]
Jim Gallacher added a comment - 20/Mar/06 01:45 AM
I was checking the documentation with respect to this patch and I noticed a couple of things.

The FieldStorage constructor has 2 undocumented parameters, file_callback=None, field_callback=None. These were added in r160259 as part of Barry Pearce's fix for MODPYTHON-40. In the same patch, the make_file and make_field methods were added. These allow FieldStorage to be subclassed and provide an alternative to using the callbacks.

In Mike's modpython325_util_py_dict.patch, the make_field and make_file methods have be removed, with the reasoning that the callbacks should be sufficient for controlling the upload.

We never had a discussion of the merits of the 2 approaches - subclassing vs passing the callbacks in the FieldStorage constructor. Should we perhaps allow both, and reinstate make_field and make_file, or are the callbacks sufficient?

Further infomation on the callbacks, including some nice examples, can be found in the python-dev archives. See http://article.gmane.org/gmane.comp.apache.mod-python.devel/760

I'll update the docs as part of the fix for this issue.

Repository Revision Date User Message
ASF #387693 Wed Mar 22 02:53:48 UTC 2006 jgallacher FieldStorage efficiency improvement provided by
Mike Looijmans. Adapted FieldStorage documentation examples
from Barry Pearce.
(MODPYTHON-93)
Files Changed
MODIFY /httpd/mod_python/trunk/Doc/modpython4.tex
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py
MODIFY /httpd/mod_python/trunk/Doc/appendixc.tex

Jim Gallacher made changes - 22/Mar/06 11:31 AM
Status Open [ 1 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Repository Revision Date User Message
ASF #393781 Thu Apr 13 10:50:36 UTC 2006 grahamd Backported MODPYTHON-93 from trunk to branches/3.2.x.
Files Changed
MODIFY /httpd/mod_python/branches/3.2.x/lib/python/mod_python/__init__.py
MODIFY /httpd/mod_python/branches/3.2.x/Doc/appendixc.tex
MODIFY /httpd/mod_python/branches/3.2.x/src/include/mpversion.h
MODIFY /httpd/mod_python/branches/3.2.x/Doc/modpython4.tex
MODIFY /httpd/mod_python/branches/3.2.x/lib/python/mod_python/util.py

Repository Revision Date User Message
ASF #416548 Fri Jun 23 02:56:11 UTC 2006 jgallacher As part of the improvments to FieldStorage, the Field class __init__ method
was changed in trunk (3.3-dev) and backported to branches/3.2.x. Although
the documentation states that the Field class is for internal use by
FieldStorage, some software applications create Field instances directly.
This change in the __init__ signature causes those applications to fail.

The code committed here is a work around that maintains forward compatibilty
with the FieldStorage improvements, while maintaining backward compatiblitly
with applications that may be using the Field class directly. (MODPYTHON-93)
Files Changed
MODIFY /httpd/mod_python/branches/3.2.x/lib/python/mod_python/util.py

Jim Gallacher added a comment - 23/Jun/06 09:53 AM
As part of the improvments to FieldStorage, the Field class __init__ method was changed in trunk (3.3-dev) and backported to branches/3.2.x. Although the documentation states that the Field class is for internal use by
FieldStorage, some software applications create Field instances directly. This change in the __init__ signature causes those applications to fail.

Breaking compatibility may be acceptable for a minor version increment (3.2 > 3.2) but is undesirable for a patch release (3.2.8 > 3.2.9). Changing the Field class to be both forward and backward compatible wil not be difficult. As such I'll commit a fix to branches/3.2.x in time for a 3.2.9 release.

Repository Revision Date User Message
ASF #417021 Sun Jun 25 14:40:44 UTC 2006 jgallacher Rolled back changes to Field and FieldStorage for MODPYTHON-93.
These changes were meant to improve FieldStorage efficiency, but
cause problems for applications such as Trac that treated
FieldStorage as a dictionary, or modified FieldStorage.list.
There is nothing in the documentation or source code stating that list
is private, or FieldStorage should be considered immutable, so these
applications were not wrong in doing so, and our changes should be
considered as unacceptable.
Files Changed
MODIFY /httpd/mod_python/branches/3.2.x/Doc/appendixc.tex
MODIFY /httpd/mod_python/branches/3.2.x/lib/python/mod_python/util.py

Jim Gallacher added a comment - 30/Jun/06 09:14 AM
Reopened issue due to problems experienced by some applications such as Trac.

We'll need to re-examine the code committed to make sure that FieldStorage behaves properly as a dictionary-like object.

Jim Gallacher made changes - 30/Jun/06 09:14 AM
Status Resolved [ 5 ] Reopened [ 4 ]
Resolution Fixed [ 1 ]
Graham Dumpleton added a comment - 11/Sep/06 10:17 AM
Have we decided that we will not try and be compatible with Trac and expect people using mod_python 3.3 to use the next unreleased version of Trac? This next version of Trac works via a WSGI gateway and thus doesn't use FieldStorage and thus doesn't have the problems seen here. Can we mark the original issue as resolved for 3.3?

Graham Dumpleton added a comment - 24/Oct/06 05:54 AM
I have looked at the code for this again and to somehow accommodate older versions of Trac is only going to introduce horrible little hacks in mod_python that we don't need, so it just isn't worth the trouble to try and support the older versions. People will simply need to upgrade to at least Trac 0.10.0 if they want to use mod_python 3.3.

If the hacks were done, there would be two parts to it. The first is that Trac derives from util.FieldStorage and adds a __setitem__() method. This would need to be overridden from the base class by replacing it with add_field().

  def __init__(self, ...):
    ...
    self.__setitem__ = self.add_field

It has to be done in the constructor and not just as a method, as doing it as a method means derived class overrides it. At that point in the constructor, derived class one is in place and so we effectively replace it.

The second part would be more difficult and why it isn't worth the trouble. The problem here is that although the derived class adds __setitem__(), it doesn't itself use it in the derived class constructor. Instead, it does the nasty thing of accessing the base class 'list' attribute directly, thereby duplicating the same code as in __setitem__() that we need to replace. The only way around this nastiness would be for the base class to override the 'append()' method of the 'list' attribute itself and do extra special magic to check for arguments of the old style and change things on the fly, updating the associated index attribute as well. This is just all too messy and magic and would surely come back and bite us.

In summary, not worth the trouble. Trac people will just have to upgrade.


Graham Dumpleton made changes - 24/Oct/06 05:54 AM
Status Reopened [ 4 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Graham Dumpleton added a comment - 26/Oct/06 10:21 AM
In response to question about dictionary like abilities on mailing list, I actually had my first good look at what the performance optimisation changes did. Previously I only looked at just the bit I thought mattered to the Trac issue. Having done this, I notice there are actually some problems in the code. Specifically, if add_field() is used after the first time the 'dictionary' attribute is accessed, then the new field doesn't actually get put in the dictionary and therefore isn't visible to __getitem__(), has_key() etc. In other words, the 'dictionary' gets set to a snapshot of 'list' and isn't updated there after. I also note that changes which introduced the method add_field() aren't reflected in documentation either.

I'll spend some time reviewing the code and fixing things up. I might also make the 'dictionary' attribute a private attribute and perhaps look at direct changes to 'list' being automatically reflected in any index somehow. In respect of 'dictionary' I want to hide it so we avoid problems in the future like in Trac where they effectively delved into the implementation, making it hard to make changes.

Graham Dumpleton made changes - 26/Oct/06 10:21 AM
Assignee Jim Gallacher [ jgallacher ] Graham Dumpleton [ grahamd ]
Status Resolved [ 5 ] Reopened [ 4 ]
Resolution Fixed [ 1 ]
Graham Dumpleton made changes - 26/Oct/06 10:21 AM
Status Reopened [ 4 ] In Progress [ 3 ]
Mike Looijmans added a comment - 26/Oct/06 10:36 AM
A very simple approach would be to delete the dictionary attribute when
any change is made.

The add_field method should be made private, and one could add a
__setitem__ method like this:

     def __setitem__(self, key, value):
         if self.__dict__.has_key('dictionary'):
             del self.dictionary
         self._add_field(key, value)

A similar approach can be taken to add the __delitem__ functionality.


Mike Looijmans
Philips Natlab / Topic Automation



Repository Revision Date User Message
ASF #468187 Thu Oct 26 23:26:29 UTC 2006 grahamd (MODPYTHON-93) Fix up issues whereby quick lookup table onto list of form
fields isn't update when add_field() is used after first time that a lookup
of table is performed. The changes also allow direct changes to the list of
fields to be made with the lookup table being invalidated so that it will
be rebuilt the next time it is required.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/__init__.py
MODIFY /httpd/mod_python/trunk/src/include/mpversion.h

Repository Revision Date User Message
ASF #468203 Fri Oct 27 00:18:08 UTC 2006 grahamd (MODPYTHON-93) Restored a level of backward compatibility for third party
packages which create instances of the Field class directly and insert them
direct into the list of form fields. This should mean that older versions
of Trac will still work.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Repository Revision Date User Message
ASF #468211 Fri Oct 27 00:33:24 UTC 2006 grahamd (MODPYTHON-93) Logic check on when to trigger backwards compatibility mode
for Field class was wrong way around.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Repository Revision Date User Message
ASF #468225 Fri Oct 27 01:44:03 UTC 2006 grahamd (MODPYTHON-93) Update documentation to list public interface of FieldStorage.
Files Changed
MODIFY /httpd/mod_python/trunk/Doc/modpython4.tex

Repository Revision Date User Message
ASF #468295 Fri Oct 27 06:32:07 UTC 2006 grahamd (MODPYTHON-93) No need to access keys of dictionary to determine len(), just
apply it to dictionary itself.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Graham Dumpleton added a comment - 27/Oct/06 08:40 AM
I have committed some changes into repository now which
fixes the issue with add_field() but also does things in such a way that
code such as Trac which creates Field instances direct and adds them
to the the list of form fields will still work. It does this by using a
derived list class instance which invalidates the key lookup table on
changes to the list so that the lookup table is rebuilt the next time
it is required. Doing this wasn't as bad as I first thought it would be
as hadn't gone to the extent of properly understanding the code before.
Also include a semi hack in Field class constructor to detect old style
use where code expects more than just the name of the field as
argument and do appropriate initialisation as appropriate.

In all, what the changes allow is for the following to work:

from mod_python import util
from StringIO import StringIO

def index(req):
  req.content_type = 'text/plain'

  req.write('%s\n\n' % req.form.keys())

  for key in req.form.keys():
    req.write('%s %s\n' % (key, req.form.getlist(key)))

  req.write('\n')

  req.form.add_field('e', 'f')

  req.form.list.append(util.Field('g', StringIO('h'),
      "text/plain", {}, None, {}))
  req.form.list.extend([util.Field('i', StringIO('j'),
      "text/plain", {}, None, {})])
  req.form.list += [util.Field('k', StringIO('l'),
      "text/plain", {}, None, {})]

  req.write('%s\n\n' % req.form.keys())

  for key in req.form.keys():
    req.write('%s %s\n' % (key, req.form.getlist(key)))

  req.write('\n')

  return None

If this is accessed with:

  http://localhost:8002/~grahamd/form/?a=b&c=d&e=f

I get:

['a', 'c', 'e']

a [Field('a', 'b')]
c [Field('c', 'd')]
e [Field('e', 'f')]

['a', 'c', 'e', 'g', 'i', 'k']

a [Field('a', 'b')]
c [Field('c', 'd')]
e [Field('e', 'f'), Field('e', 'f')]
g [Field('g', 'h')]
i [Field('i', 'j')]
k [Field('k', 'l')]

As the req.form.keys() method works off the lookup table, you can see
how updates direct to list work okay, meaning stuff like Trac should
work.

You will also note that the 'dictionary' attribute has gone and is hidden
elsewhere. Hopefully people will not try and directly access the
equivalent, but some stern notes in the documentation saying only
to use documented interface might help. :-)

If we can test the updates and especially if someone can test it with
older versions of Trac that would be great, but even if we can't will
mark issue as resolved after a short wait.

Repository Revision Date User Message
ASF #468838 Sun Oct 29 06:17:06 UTC 2006 grahamd (MODPYTHON-93) Added clear(), __delitem__() and __setitem__() methods. The
__setitem__() method is an alias for add_field() and thus is additative
unlike a traditional dictionary.
Files Changed
MODIFY /httpd/mod_python/trunk/Doc/modpython4.tex
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/__init__.py
MODIFY /httpd/mod_python/trunk/src/include/mpversion.h

Repository Revision Date User Message
ASF #468841 Sun Oct 29 07:31:39 UTC 2006 grahamd (MODPYTHON-93) The __delitem__() wasn't throwing a KeyError exception when
the field name didn't exist.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Repository Revision Date User Message
ASF #468842 Sun Oct 29 07:46:25 UTC 2006 grahamd (MODPYTHON-93) Missing __iter__() method to support 'for k in form'.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Graham Dumpleton added a comment - 29/Oct/06 07:48 AM
Okay, bit the bullet and made FieldStorage a bit more dictionary like after a bit of discussion on the mailing lists by adding __setitem__(), __delitem__(), __iter__(), __repr__() and clear() on top of other dictionary like methods it already had. To say it is dictionary like is probably only true though for the case where there is a single value against all form fields. Once you start having multiple values against a form field it starts to behave a bit differently. That it isn't strictly the same has been documented.

First difference is that subscript operator when used to access fields returns the actual value when there is only one, but a list of values when there is more than one. To always ensure one only gets a single value, should use getfirst(), if always a list, use getlist() instead. The way the subscript operator behaves is the same as cgi.FieldStorage so not doing anything different here.

Second difference is that subscript operator when used to set the value of a field actually adds the value to an existing value if one already exists. In other words, it does not replace the existing value but adds another. This is because it uses add_field(). There is no equivalent in cgi.FieldStorage, but this behaviour is consistent with how Trac was overriding the class and providing its own __setitem__() method. If one really wants to supplant an existing value, should use del operator first to remove existing values.

So there is some strangeness in how it works. If anyone specifically sees a big problem with the __setitem__() method and how it works, speak up.

Repository Revision Date User Message
ASF #468843 Sun Oct 29 07:49:12 UTC 2006 grahamd (MODPYTHON-93) Missing an appropriate __repr__() method as well on
FieldStorage class.
Files Changed
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py

Graham Dumpleton added a comment - 05/Nov/06 06:49 AM
Not heard of anyone who has specifically tested to see if this works with older versions of Trac or not, but code passing tests on various Python/Apache versions so that's good enough for now.

Graham Dumpleton made changes - 05/Nov/06 06:49 AM
Resolution Fixed [ 1 ]
Status In Progress [ 3 ] Resolved [ 5 ]
Graham Dumpleton added a comment - 21/Nov/06 10:50 PM
After further experimentation, following the Trac model of having __setitem__() be an alias for add_field() is a pain in the neck. As reminder, the Trac way of doing things mean't that assigning using subscript operator resulted in additional field values being added rather than existing ones being replaced as the subscript operator works in every other dictionary like data structure.

Having it work this way, it is too easy to forget it is different and your code doesn't work as expected. For newbies this could cause a lot of grief. Thus, how Trac did things should be ignored and need to make __setitem__() replace all existing values for field like a normal dictionary. If someone wants to add additional values, they should use add_field() instead.

Doing this will not cause any problems for Trac compatibility as Trac overrides __setitem__() in a derived class anyway.

Graham Dumpleton made changes - 21/Nov/06 10:50 PM
Resolution Fixed [ 1 ]
Status Resolved [ 5 ] Reopened [ 4 ]
Graham Dumpleton made changes - 21/Nov/06 10:50 PM
Status Reopened [ 4 ] In Progress [ 3 ]
Repository Revision Date User Message
ASF #478129 Wed Nov 22 11:15:54 UTC 2006 grahamd (MODPYTHON-93) Made assignment using the subscript operator against the
FieldStorage class act like a dictionary. In other words, any existing values
against a field are replaced with a single value rather than an additional
value being added against the field.
Files Changed
MODIFY /httpd/mod_python/trunk/Doc/modpython4.tex
MODIFY /httpd/mod_python/trunk/lib/python/mod_python/util.py
MODIFY /httpd/mod_python/trunk/Doc/appendixc.tex

Graham Dumpleton made changes - 22/Nov/06 11:18 AM
Resolution Fixed [ 1 ]
Status In Progress [ 3 ] Resolved [ 5 ]
Graham Dumpleton made changes - 02/Apr/07 11:40 AM
Status Resolved [ 5 ] Closed [ 6 ]