|
What it does:
- Simplifies the creation of StringField objects. This was already marked as a TODO in the 3.2.5b code. - Does not create a file object (cStringIO) for each and every field. - FieldStorage.get() will always return the same instance(s) for any given name. - FieldStorage.get() is very cheap now (does not create new objects, which is expensive in Python) - use a dictionary-on-demand. - Lots of code removed for this one, and a few lines added. - items() function, which, contrary to common belief, preserves argument ordering. I was checking the documentation with respect to this patch and I noticed a couple of things.
The FieldStorage constructor has 2 undocumented parameters, file_callback=None, field_callback=None. These were added in r160259 as part of Barry Pearce's fix for In Mike's modpython325_util_py_dict.patch, the make_field and make_file methods have be removed, with the reasoning that the callbacks should be sufficient for controlling the upload. We never had a discussion of the merits of the 2 approaches - subclassing vs passing the callbacks in the FieldStorage constructor. Should we perhaps allow both, and reinstate make_field and make_file, or are the callbacks sufficient? Further infomation on the callbacks, including some nice examples, can be found in the python-dev archives. See http://article.gmane.org/gmane.comp.apache.mod-python.devel/760 I'll update the docs as part of the fix for this issue. As part of the improvments to FieldStorage, the Field class __init__ method was changed in trunk (3.3-dev) and backported to branches/3.2.x. Although the documentation states that the Field class is for internal use by
FieldStorage, some software applications create Field instances directly. This change in the __init__ signature causes those applications to fail. Breaking compatibility may be acceptable for a minor version increment (3.2 > 3.2) but is undesirable for a patch release (3.2.8 > 3.2.9). Changing the Field class to be both forward and backward compatible wil not be difficult. As such I'll commit a fix to branches/3.2.x in time for a 3.2.9 release. Reopened issue due to problems experienced by some applications such as Trac.
We'll need to re-examine the code committed to make sure that FieldStorage behaves properly as a dictionary-like object. Have we decided that we will not try and be compatible with Trac and expect people using mod_python 3.3 to use the next unreleased version of Trac? This next version of Trac works via a WSGI gateway and thus doesn't use FieldStorage and thus doesn't have the problems seen here. Can we mark the original issue as resolved for 3.3?
I have looked at the code for this again and to somehow accommodate older versions of Trac is only going to introduce horrible little hacks in mod_python that we don't need, so it just isn't worth the trouble to try and support the older versions. People will simply need to upgrade to at least Trac 0.10.0 if they want to use mod_python 3.3.
If the hacks were done, there would be two parts to it. The first is that Trac derives from util.FieldStorage and adds a __setitem__() method. This would need to be overridden from the base class by replacing it with add_field(). def __init__(self, ...): ... self.__setitem__ = self.add_field It has to be done in the constructor and not just as a method, as doing it as a method means derived class overrides it. At that point in the constructor, derived class one is in place and so we effectively replace it. The second part would be more difficult and why it isn't worth the trouble. The problem here is that although the derived class adds __setitem__(), it doesn't itself use it in the derived class constructor. Instead, it does the nasty thing of accessing the base class 'list' attribute directly, thereby duplicating the same code as in __setitem__() that we need to replace. The only way around this nastiness would be for the base class to override the 'append()' method of the 'list' attribute itself and do extra special magic to check for arguments of the old style and change things on the fly, updating the associated index attribute as well. This is just all too messy and magic and would surely come back and bite us. In summary, not worth the trouble. Trac people will just have to upgrade. In response to question about dictionary like abilities on mailing list, I actually had my first good look at what the performance optimisation changes did. Previously I only looked at just the bit I thought mattered to the Trac issue. Having done this, I notice there are actually some problems in the code. Specifically, if add_field() is used after the first time the 'dictionary' attribute is accessed, then the new field doesn't actually get put in the dictionary and therefore isn't visible to __getitem__(), has_key() etc. In other words, the 'dictionary' gets set to a snapshot of 'list' and isn't updated there after. I also note that changes which introduced the method add_field() aren't reflected in documentation either.
I'll spend some time reviewing the code and fixing things up. I might also make the 'dictionary' attribute a private attribute and perhaps look at direct changes to 'list' being automatically reflected in any index somehow. In respect of 'dictionary' I want to hide it so we avoid problems in the future like in Trac where they effectively delved into the implementation, making it hard to make changes. A very simple approach would be to delete the dictionary attribute when
any change is made. The add_field method should be made private, and one could add a __setitem__ method like this: def __setitem__(self, key, value): if self.__dict__.has_key('dictionary'): del self.dictionary self._add_field(key, value) A similar approach can be taken to add the __delitem__ functionality. Mike Looijmans Philips Natlab / Topic Automation I have committed some changes into repository now which
fixes the issue with add_field() but also does things in such a way that code such as Trac which creates Field instances direct and adds them to the the list of form fields will still work. It does this by using a derived list class instance which invalidates the key lookup table on changes to the list so that the lookup table is rebuilt the next time it is required. Doing this wasn't as bad as I first thought it would be as hadn't gone to the extent of properly understanding the code before. Also include a semi hack in Field class constructor to detect old style use where code expects more than just the name of the field as argument and do appropriate initialisation as appropriate. In all, what the changes allow is for the following to work: from mod_python import util from StringIO import StringIO def index(req): req.content_type = 'text/plain' req.write('%s\n\n' % req.form.keys()) for key in req.form.keys(): req.write('%s %s\n' % (key, req.form.getlist(key))) req.write('\n') req.form.add_field('e', 'f') req.form.list.append(util.Field('g', StringIO('h'), "text/plain", {}, None, {})) req.form.list.extend([util.Field('i', StringIO('j'), "text/plain", {}, None, {})]) req.form.list += [util.Field('k', StringIO('l'), "text/plain", {}, None, {})] req.write('%s\n\n' % req.form.keys()) for key in req.form.keys(): req.write('%s %s\n' % (key, req.form.getlist(key))) req.write('\n') return None If this is accessed with: http://localhost:8002/~grahamd/form/?a=b&c=d&e=f I get: ['a', 'c', 'e'] a [Field('a', 'b')] c [Field('c', 'd')] e [Field('e', 'f')] ['a', 'c', 'e', 'g', 'i', 'k'] a [Field('a', 'b')] c [Field('c', 'd')] e [Field('e', 'f'), Field('e', 'f')] g [Field('g', 'h')] i [Field('i', 'j')] k [Field('k', 'l')] As the req.form.keys() method works off the lookup table, you can see how updates direct to list work okay, meaning stuff like Trac should work. You will also note that the 'dictionary' attribute has gone and is hidden elsewhere. Hopefully people will not try and directly access the equivalent, but some stern notes in the documentation saying only to use documented interface might help. :-) If we can test the updates and especially if someone can test it with older versions of Trac that would be great, but even if we can't will mark issue as resolved after a short wait. Okay, bit the bullet and made FieldStorage a bit more dictionary like after a bit of discussion on the mailing lists by adding __setitem__(), __delitem__(), __iter__(), __repr__() and clear() on top of other dictionary like methods it already had. To say it is dictionary like is probably only true though for the case where there is a single value against all form fields. Once you start having multiple values against a form field it starts to behave a bit differently. That it isn't strictly the same has been documented.
First difference is that subscript operator when used to access fields returns the actual value when there is only one, but a list of values when there is more than one. To always ensure one only gets a single value, should use getfirst(), if always a list, use getlist() instead. The way the subscript operator behaves is the same as cgi.FieldStorage so not doing anything different here. Second difference is that subscript operator when used to set the value of a field actually adds the value to an existing value if one already exists. In other words, it does not replace the existing value but adds another. This is because it uses add_field(). There is no equivalent in cgi.FieldStorage, but this behaviour is consistent with how Trac was overriding the class and providing its own __setitem__() method. If one really wants to supplant an existing value, should use del operator first to remove existing values. So there is some strangeness in how it works. If anyone specifically sees a big problem with the __setitem__() method and how it works, speak up. Not heard of anyone who has specifically tested to see if this works with older versions of Trac or not, but code passing tests on various Python/Apache versions so that's good enough for now.
After further experimentation, following the Trac model of having __setitem__() be an alias for add_field() is a pain in the neck. As reminder, the Trac way of doing things mean't that assigning using subscript operator resulted in additional field values being added rather than existing ones being replaced as the subscript operator works in every other dictionary like data structure.
Having it work this way, it is too easy to forget it is different and your code doesn't work as expected. For newbies this could cause a lot of grief. Thus, how Trac did things should be ignored and need to make __setitem__() replace all existing values for field like a normal dictionary. If someone wants to add additional values, they should use add_field() instead. Doing this will not cause any problems for Trac compatibility as Trac overrides __setitem__() in a derived class anyway. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
class FieldStorage:
def __init__():
self.index = {} # holds references to the Field objects and can be used as an index.
self.list = []
... blah blah blah ...
... create the field and get it's name ...
self.list.append(field)
if name in self.index:
self.index[name].append(field)
else:
self.index[name] = [field,]
def __getitem__(self, key):
if key not in self.index:
raise KeyError, key
found = self.index[key]
if len(found) == 1:
return found[0]
else:
return found
The other FieldStorage methods would need to be refactored to take advantage of the index.