Issue Details (XML | Word | Printable)

Key: FILEUPLOAD-94
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Mike Samuel
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons FileUpload

Fileupload fails for forms with a large number of inputs

Created: 01/Nov/03 04:59 AM   Updated: 09/Mar/07 08:32 PM
Return to search
Component/s: None
Affects Version/s: 1.0 Final
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File fileupload-memory.patch 2004-10-16 03:25 AM Charles Oliver Nutter 2 kB
Text File fileupload-patch 2003-11-01 05:01 AM Mike Samuel 8 kB
Java Source File HDFileItemFactory.java 2004-10-08 05:42 PM Wolfgang Knauf 3 kB
Java Source File TestServlet.java 2003-11-01 05:00 AM Mike Samuel 4 kB
Environment:
Operating System: Linux
Platform: PC

Bugzilla Id: 24306


 Description  « Hide
Attached is a test case servlet which repeats the problem with the Jun 24, 2003
fileupload-1.0 release.

FileUploadBase.parseRequest runs out of memory when parsing a form with a large
number of inputs. The cause seems to be DeferredFileOutputStream which
allocates a ByteArrayOutputStream per input, each of which preallocates a buffer
of length inMemoryThreshold. The in memory threshold defaults to 10k, but if it
is made larger (> 1 M in our environment), then the vm quickly runs out of
memory. Most of this memory is wasted since most files are ~5k, and almost all
non-file inputs are less than 1k.

I patched DeferredFileOutputStream to use a different underlying in memory
stream. I don't really know why ByteArrayOutputStream uses a single byte[],
since you can't do random access on the underlying buffer anyway.
I think the patch is something that could be incorporated into the default file
upload implementation without any noticable change in performance.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Mike Samuel added a comment - 01/Nov/03 05:00 AM
Created an attachment (id=8857)
a simple servlet that repeats the error

Mike Samuel added a comment - 01/Nov/03 05:01 AM
Created an attachment (id=8858)
patch to fix the bug

Martin Cooper added a comment - 01/Nov/03 05:50 AM
I'm not sure I understand what you're doing. Are you trying to keep everything
in memory, and prevent anything from being written to disk? If so, why are you
trying to use *Disk*FileUpload? You should be using a custom implementation for
that. If that's not what you're trying to do, then why are you setting the in-
memory threshold so high?

The idea behind the disk implementation is that small fields - typically the
non-file fields - will be kept in memory, while the larger ones will be written
to disk. I can see that perhaps pre-allocating the threshold value could be
changed to pre-allocate a smaller amount if the threshold value is high - but
the point of the default implementation is that it shouldn't be high in the
first place.

I don't understand you're point about ByteArrayOutputStream and byte[]. Why are
you trying to do random access on what is intended to be just a form field?
Again, it really sounds like you need a custom implementation for whatever it
is you're trying to do, rather than trying to bend the default implementation
to meet your needs. That is, after all, why FileUpload is designed to be easily
customised.


Mike Samuel added a comment - 01/Nov/03 06:04 AM
The scenario I'm dealing is a form with a lot of non-form inputs, and 1 form
input that I want to process in memory when possible, but write to disk when
not. I chose a higher in memory threshold because that seems to get the best
throughput in our situation.

My main issue with the current design is that it preallocates the in memory
threshold *even for non-file inputs*.

My point about ByteArrayOutputStream and byte[] was that since you can't do
random access on the underlying byte[] because it's private, there's no point in
using a single byte array. It's more efficient from both a memory and processor
standpoint to stream into a linked list of byte buffers than expand a single
buffer, incurring a copy penalty.


Martin Cooper added a comment - 01/Feb/04 01:48 PM
Changing from a bug to an enhancement.

I intend to resolve this by using the ByteArrayOutputStream implementation from
Commons IO, which is similar to the implementation in the patch attached to
this enhancement request. However, Commons IO is thus far an unreleased
component, so I'm going to wait for that to be released first.


Joe Germuska added a comment - 24/Mar/04 11:11 PM
Would it make sense to copy that one class into commons-fileupload rather than
incur a dependency on all of commons-io ? It seems like this has recently come
up as a strategy for dealing with commons interdependencies, and it seems like
there are cases where it is justifiable.

Martin Cooper added a comment - 25/Mar/04 06:52 AM
Perhaps, if it was just that one class. However, two classes from FileUpload
have moved (well, there are copies for now) to IO, and another class has been
added to IO that needs to be picked up and used in FileUpload, so there are 4
classes in IO that FileUpload wants. I think at that point that a dependency is
warranted.

Wolfgang Knauf added a comment - 08/Oct/04 05:41 PM
I encountered this problem, too.
My form contained about 15 form fields and 5 file upload fields.
My threshold was 1 MB and so I wasted about 20 MB.

Simple workaround: I subclassed DefaultFileItemFactory and added a
property "sizeThresholdFormField". In "createItem" I created a default FileItem
if "isFormField" was false. For form fields I created a FileItem with a
threshold of "sizeThresholdFormField".
So the waste of memory was much lower for form fields. On the other hand: if a
textarea contains a lot of text it could be stored in a file, which is a loss
of performance.

Usage:
DiskFileUpload fileUpload = new DiskFileUpload (new HDFileItemFactory(1*1024
*1024, 1024, repository) );

This created a FileUpload with thresholds of 1 MB for files and 1 KB for form
fields.
In the above sample I would need 5 MB + some KB instead of 20 MB.


Wolfgang Knauf added a comment - 08/Oct/04 05:42 PM
Created an attachment (id=12996)
FileItemFactory with different Thresholds for file items and form fields

Martin Cooper added a comment - 11/Oct/04 11:35 AM
Fixed in the 10/11/04 nightly builds of Commons FileUpload and Commons IO.

FileUpload has been changed to depend on Commons IO, and Commons IO has been
changed to depend on its own version of ByteArrayOutputStream instead of the
java.io one, thus eliminating the need to preallocate the entire buffer size.


Charles Oliver Nutter added a comment - 16/Oct/04 03:25 AM
Created an attachment (id=13106)
Patch to DefaultFileItemFactory from FILEUPLOAD_1_0

Charles Oliver Nutter added a comment - 16/Oct/04 03:45 AM
Added a patch based on FileUpload 1.0 (FILEUPLOAD_1_0 tag) that uses 1024 for
non-file form fields.

Incidentally, isn't it a bit of a misnomer to call this value a "threshhold"?
ByteArrayOutputStream will grow the array if necessary to accommodate incoming
data. The problem here was that, for performance reasons, FileUpload asked the
VM to always allocate a large amount of memory, regardless of what would
actually be needed. When there were many form fields (as in our case, where we
have SELECTs sending hundreds of values) the request became unprocessable. Each
of the thousand fields tried to allocate several hundred K of memory.

We are using the attached patch for our production system now.


Martin Cooper added a comment - 16/Oct/04 09:52 AM
I'm unclear as to why you are adding a patch here - if you check the log you'll
see that this bug was fixed four days ago. As for why the value is called a
threshold, it's because this is the threshold below which the item is retained
in memory and above which it is written to disk.