Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9454

Support multipart uploads for s3native

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: fs/s3
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem lets you bypass this restriction by uploading blocks, it is necessary for us to output our data into Amazon's publicdatasets bucket which is shared with others.

      Amazon has added a similar feature to their distribution of hadoop as has MapR.

      Please note that while this supports large copies, it does not yet support parallel copies because jets3t doesn't expose an API yet that allows it without hadoop controlling the threads unlike with upload.

      By default, this patch does not enable multipart uploads. To enable them and parallel uploads:

      add the following keys to your hadoop config:

      <property>
      <name>fs.s3n.multipart.uploads.enabled</name>
      <value>true</value>
      </property>
      <property>
      <name>fs.s3n.multipart.uploads.block.size</name>
      <value>67108864</value>
      </property>
      <property>
      <name>fs.s3n.multipart.copy.block.size</name>
      <value>5368709120</value>
      </property>

      create a /etc/hadoop/conf/jets3t.properties file with or similar to:

      storage-service.internal-error-retry-max=5
      storage-service.disable-live-md5=false
      threaded-service.max-thread-count=20
      threaded-service.admin-max-thread-count=20
      s3service.max-thread-count=20
      s3service.admin-max-thread-count=20

        Attachments

        1. HADOOP-9454-10.patch
          32 kB
          Jordan Mendelson
        2. HADOOP-9454-11.patch
          14 kB
          Akira Ajisaka
        3. HADOOP-9454-12.patch
          15 kB
          Akira Ajisaka

          Issue Links

            Activity

              People

              • Assignee:
                ajisakaa Akira Ajisaka
                Reporter:
                aloisius Jordan Mendelson
              • Votes:
                4 Vote for this issue
                Watchers:
                22 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: