Commons FileUpload
  1. Commons FileUpload
  2. FILEUPLOAD-168

read form field parameters even if maxSize has been exceeded

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      commons fileupload 1.2.2-SNAPSHOT

      Description

      Hi!

      This issue is similar to FILEUPLOAD-140. I can't seem to reopen it so I created a new one instead. FILEUPLOAD-140 was marked as resolve by using the streaming API, if I'm not mistaken. No change was done. But I disagree on the resolution of simply using the streaming API (as detailed in http://commons.apache.org/fileupload/streaming.html).

      First of all, I tried to upload a big file exceeding maxSize with streaming API and got the SizeLimitExceededException even before ANY parameter has been read.

      ServletFileUpload.getItemIterator() calls FileUploadBase.getItemIterator() which creates a new FileItemIteratorImpl(). In the constructor code of FileItemIteratorImpl, it already checks for the requestSize and throws SizeLimitExceededException if sizeMax is exceeded.

      I'd like to open this discussion again and hope that in the end, we can have either:

      • form field parameters BEFORE the file parameter will still get read if requestSize is greater than sizeMax and then terminate once we reach the file
      • all form field parameters will still get read if requestSize is greater than sizeMax. But, we should skip reading the body of the files and proceed to the next 'boundary' so as not to keep the user waiting, if ever this is possible. (preferred)

      Then, we should also apply the same improvement into PortletFileUpload.

      1. fileupload2.patch
        4 kB
        Paul Rivera
      2. fileupload1.patch
        2 kB
        Paul Rivera

        Activity

        Hide
        Paul Rivera added a comment -

        I've attached fileupload1.patch above just to get things started and to add more into our discussion.

        Show
        Paul Rivera added a comment - I've attached fileupload1.patch above just to get things started and to add more into our discussion.
        Hide
        Paul Rivera added a comment -

        Uploaded a version of the patch named fileupload2.patch with some additional methods.

        Show
        Paul Rivera added a comment - Uploaded a version of the patch named fileupload2.patch with some additional methods.
        Hide
        F. Andy Seidl added a comment -

        Regarding the second idea:

        >>all form field parameters will still get read if requestSize is greater than sizeMax. But, we should skip reading the body of the files and proceed to the next 'boundary' so as not to keep the user waiting, if ever this is possible. (preferred) <<

        Because the posted data is delivered as a sequential stream, there is no way to "skip" to the next boundary other than reading the stream.

        It would be possible to have the framework do this (ie. read the complete file but discard it if it is too large), but it an application could do that right now. E.g., suppose MY_MAX_SIZE is the max size of the file you are willing to accept, you could set sizeMax to, say, 3*MY_MAX_SIZE to allow the framework to consume file content up to three times the size of a file you are actually willing to accept in order to try to read parametres after the file. Then, in your application, if you find the file size is creater than MY_MAX_SIZE, you discard it.

        Show
        F. Andy Seidl added a comment - Regarding the second idea: >>all form field parameters will still get read if requestSize is greater than sizeMax. But, we should skip reading the body of the files and proceed to the next 'boundary' so as not to keep the user waiting, if ever this is possible. (preferred) << Because the posted data is delivered as a sequential stream, there is no way to "skip" to the next boundary other than reading the stream. It would be possible to have the framework do this (ie. read the complete file but discard it if it is too large), but it an application could do that right now. E.g., suppose MY_MAX_SIZE is the max size of the file you are willing to accept, you could set sizeMax to, say, 3*MY_MAX_SIZE to allow the framework to consume file content up to three times the size of a file you are actually willing to accept in order to try to read parametres after the file. Then, in your application, if you find the file size is creater than MY_MAX_SIZE, you discard it.
        Hide
        Simon Kitching added a comment -

        It seems to me that something needs to be done at least.

        As the size is being checked even in streaming mode there is no reasonable way to access data after the file body even in streaming mode. Setting sizeMax to an artificially-high level then implementing a custom size check while reading from the stream is possible as a workaround, but is really pretty ugly.

        I think the current behaviour should remain the default - AFACT most people will not need the params that follow the file to be available (or there would have been complaints already). Obviously reading all the file contents from the stream (and just discarding them) is always inefficient, and for most people not needed. But clearly for some people it is necessary.

        I'm not sure about the attached patch. Simply continuing to return data when the size is exceeded seems pointless as that data is clearly not needed. Instead, how about detecting size overflow then simply entering a loop to read-and-discard-input until end of that mime part is found, then throw the exception. People who care could then catch the exception and keep going as the input stream is correctly positioned.

        Show
        Simon Kitching added a comment - It seems to me that something needs to be done at least. As the size is being checked even in streaming mode there is no reasonable way to access data after the file body even in streaming mode. Setting sizeMax to an artificially-high level then implementing a custom size check while reading from the stream is possible as a workaround, but is really pretty ugly. I think the current behaviour should remain the default - AFACT most people will not need the params that follow the file to be available (or there would have been complaints already). Obviously reading all the file contents from the stream (and just discarding them) is always inefficient, and for most people not needed. But clearly for some people it is necessary. I'm not sure about the attached patch. Simply continuing to return data when the size is exceeded seems pointless as that data is clearly not needed. Instead, how about detecting size overflow then simply entering a loop to read-and-discard-input until end of that mime part is found, then throw the exception. People who care could then catch the exception and keep going as the input stream is correctly positioned.
        Hide
        Martin Cooper added a comment -

        The semantic of sizeMax is specifically "the maximum allowed size of a complete request". That is, if the complete request is larger than sizeMax, regardless of how many files or fields are being uploaded within it, it is rejected, because that is exactly what sizeMax is defined to do.

        The semantic of fileSizeMax is "the maximum size permitted for a single uploaded file", which allows the processing of the entire request, but rejecting any individual files that exceed the specified maximum size for a single file.

        Most of the issues being raised in this report could be resolved by simply using fileSizeMax instead of sizeMax. Trying to use sizeMax for this does not make sense, because it does not make sense to allow a request to be processed that specifically violates the semantic of sizeMax.

        If that still doesn't satisfy, the streaming mode was implemented to provide more flexibility than sizeMax and fileSize Max can do on their own, so if you need that additional flexibility, the streaming mode was designed for you.

        Show
        Martin Cooper added a comment - The semantic of sizeMax is specifically "the maximum allowed size of a complete request". That is, if the complete request is larger than sizeMax, regardless of how many files or fields are being uploaded within it, it is rejected, because that is exactly what sizeMax is defined to do. The semantic of fileSizeMax is "the maximum size permitted for a single uploaded file", which allows the processing of the entire request, but rejecting any individual files that exceed the specified maximum size for a single file. Most of the issues being raised in this report could be resolved by simply using fileSizeMax instead of sizeMax. Trying to use sizeMax for this does not make sense, because it does not make sense to allow a request to be processed that specifically violates the semantic of sizeMax. If that still doesn't satisfy, the streaming mode was implemented to provide more flexibility than sizeMax and fileSize Max can do on their own, so if you need that additional flexibility, the streaming mode was designed for you.
        Hide
        Jochen Wiedmann added a comment -

        Paul, I am sorry, but I won't do anything in that area.

        First of all, as Martin has already pointed out, you should most possibly be using FileSizeMax, and not SizeMax.

        Second, the limits are a security measure and designed to prevent DOS attacks. Obviously, your fear is not a DOS. But then you should question yourself, whether you need these limits at all and whether it wouldn't be just better to iterate over the parameters using the streaming API without throwing any exceptions, possibly discarding files, which become too large.

        Third, no one with a sane mind would ignore the content-length header fields, in order to throw exceptions as soon as possible. But this means that you always have to expect such exceptions before the data you desire has even been read.

        Show
        Jochen Wiedmann added a comment - Paul, I am sorry, but I won't do anything in that area. First of all, as Martin has already pointed out, you should most possibly be using FileSizeMax, and not SizeMax. Second, the limits are a security measure and designed to prevent DOS attacks. Obviously, your fear is not a DOS. But then you should question yourself, whether you need these limits at all and whether it wouldn't be just better to iterate over the parameters using the streaming API without throwing any exceptions, possibly discarding files, which become too large. Third, no one with a sane mind would ignore the content-length header fields, in order to throw exceptions as soon as possible. But this means that you always have to expect such exceptions before the data you desire has even been read.

          People

          • Assignee:
            Jochen Wiedmann
            Reporter:
            Paul Rivera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development