[HADOOP-9454] Support multipart uploads for s3native - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.4.0
Component/s: fs/s3
Labels:
None

Hadoop Flags:

Reviewed

Description

The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem lets you bypass this restriction by uploading blocks, it is necessary for us to output our data into Amazon's publicdatasets bucket which is shared with others.

Amazon has added a similar feature to their distribution of hadoop as has MapR.

Please note that while this supports large copies, it does not yet support parallel copies because jets3t doesn't expose an API yet that allows it without hadoop controlling the threads unlike with upload.

By default, this patch does not enable multipart uploads. To enable them and parallel uploads:

add the following keys to your hadoop config:

<property>
<name>fs.s3n.multipart.uploads.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.s3n.multipart.uploads.block.size</name>
<value>67108864</value>
</property>
<property>
<name>fs.s3n.multipart.copy.block.size</name>
<value>5368709120</value>
</property>

create a /etc/hadoop/conf/jets3t.properties file with or similar to:

storage-service.internal-error-retry-max=5
storage-service.disable-live-md5=false
threaded-service.max-thread-count=20
threaded-service.admin-max-thread-count=20
s3service.max-thread-count=20
s3service.admin-max-thread-count=20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-9454-10.patch
18/Jun/13 01:14
32 kB
Jordan Mendelson
HADOOP-9454-11.patch
13/Feb/14 22:08
14 kB
Akira Ajisaka
HADOOP-9454-12.patch
13/Feb/14 23:27
15 kB
Akira Ajisaka

Issue Links

is related to

HADOOP-10400 Incorporate new S3A FileSystem implementation

Closed

supercedes

HADOOP-8136 Enhance hadoop to use a newer version (0.8.1) of the jets3t library

Resolved

Activity

People

Assignee:: Akira Ajisaka

Reporter:: Jordan Mendelson

Votes:: 4 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 04/Apr/13 01:14

Updated:: 07/May/14 11:09

Resolved:: 26/Feb/14 20:35