Uploaded image for project: 'jclouds'
  1. jclouds
  2. JCLOUDS-1521

Automatic computation of content length for input streams

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.1
    • None
    • jclouds-blobstore
    • None

    Description

      I have a REST API that allows upload of potentially large files (up to 4GBs). Due to the size, I cannot load these files in-memory, as that could quickly crash my application. I also don't want to store them as temporary files, since that could fill up my disk if a lot of people decide to upload at the same time.

      Instead, I want to process the incoming files as InputStreams and forward them to the S3 object store. I understand that this is not possible directly, since S3 requires the content length to be known before the upload. However, I saw on StackOverflow (https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header) that it's possible to workaround this problem by reading the InputStream in memory in chunks of 5 (or more) MBs and uploading these chunks via the S3 multipart upload API. As a result, I assume that I'll be able to upload a 4GB file, by having no more than 5 MBs of its content stored in-memory at any given time.

      I tried to do so with JClouds (version 2.1.1), but I've hit a problem. I have the following code:

      Blob blob = blobStore.blobBuilder(name)
          .payload(inputStream)
          ...
          .build();
      blobStore.putBlob(container, blob, PutOptions.Builder.multipart());
      

      If I run it like this, I get a NullPointerException, because I didn't specify the content's length:

      java.lang.NullPointerException: while trying to invoke the method java.lang.Long.longValue() of a null object returned from org.jclouds.io.MutableContentMetadata.getContentLength()
         at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
         at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:347)
         at org.jclouds.aws.s3.blobstore.AWSS3BlobStore.putBlob(AWSS3BlobStore.java:79)
      

      I think it would be possible for JClouds to compute the size of the InputStream dynamically:

      1. Slice the stream into chunks of X MBs and store the chunks in-memory (where X has a default value but is also configurable).
      2. Upload the chunks sequentially - the content length header can be set to X MBs.
      3. Finalize the multipart upload.

      That way, no more than X MBs will be stored in memory for any given upload.

      Would you accept a pull request for this?

      PS: I've set the priority to blocker, because we really can't use JClouds for our upload right now, because of the memory and disk space concerns listed above.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nictas Alexander Tsvetkov
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m