[HADOOP-4422] S3 file systems should not create bucket - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.18.1
Fix Version/s: 0.20.0
Component/s: fs/s3
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:
Modified Hadoop file system to no longer create S3 buckets. Applications can create buckets for their S3 file systems by other means, for example, using the JetS3t API.

Description

Both S3 file systems (s3 and s3n) try to create the bucket at every initialization. This is bad because

Every S3 operation costs money. These unnecessary calls are an unnecessary expense.
These calls can fail when called concurrently. This makes the file system unusable in large jobs.
Any operation, such as a "fs -ls", creates a bucket. This is counter-intuitive and undesirable.

The initialization code should assume the bucket exists:

Creating a bucket is a very rare operation. Accounts are limited to 100 buckets.
Any check at initialization for bucket existence is a waste of money.

Per Amazon: "Because bucket operations work against a centralized, global resource space, it is not appropriate to make bucket create or delete calls on the high availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often."

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hadoop-s3n-nocreate.patch
16/Oct/08 19:57
0.9 kB
David Phillips
hadoop-s3n-nocreate.patch
21/Nov/08 22:25
2 kB
David Phillips

Activity

People

Assignee:: David Phillips

Reporter:: David Phillips

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Oct/08 21:43

Updated:: 23/Apr/09 19:17

Resolved:: 25/Nov/08 12:03