Hadoop Common
  1. Hadoop Common
  2. HADOOP-4422

S3 file systems should not create bucket

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.18.1
    • Fix Version/s: 0.20.0
    • Component/s: fs/s3
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Modified Hadoop file system to no longer create S3 buckets. Applications can create buckets for their S3 file systems by other means, for example, using the JetS3t API.

      Description

      Both S3 file systems (s3 and s3n) try to create the bucket at every initialization. This is bad because

      • Every S3 operation costs money. These unnecessary calls are an unnecessary expense.
      • These calls can fail when called concurrently. This makes the file system unusable in large jobs.
      • Any operation, such as a "fs -ls", creates a bucket. This is counter-intuitive and undesirable.

      The initialization code should assume the bucket exists:

      • Creating a bucket is a very rare operation. Accounts are limited to 100 buckets.
      • Any check at initialization for bucket existence is a waste of money.

      Per Amazon: "Because bucket operations work against a centralized, global resource space, it is not appropriate to make bucket create or delete calls on the high availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often."

      1. hadoop-s3n-nocreate.patch
        0.9 kB
        David Phillips
      2. hadoop-s3n-nocreate.patch
        2 kB
        David Phillips

        Activity

        Hide
        Robert Chansler added a comment -

        Edit release note for publication.

        Show
        Robert Chansler added a comment - Edit release note for publication.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #670 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/670/)
        . S3 file systems should not create bucket. Contributed by David Phillips.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #670 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/670/ ) . S3 file systems should not create bucket. Contributed by David Phillips.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks David!

        Show
        Tom White added a comment - I've just committed this. Thanks David!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12394458/hadoop-s3n-nocreate.patch
        against trunk revision 719787.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394458/hadoop-s3n-nocreate.patch against trunk revision 719787. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3638/console This message is automatically generated.
        Hide
        David Phillips added a comment -

        Never mind about the warnings from Jets3t during testing. They are expected.

        Show
        David Phillips added a comment - Never mind about the warnings from Jets3t during testing. They are expected.
        Hide
        David Phillips added a comment -

        I ran the tests as follows after setting the correct test buckets and keys in src/test/hadoop-site.xml:

        ant -Dtestcase=Jets3tS3FileSystemContractTest test
        ant -Dtestcase=Jets3tNativeS3FileSystemContractTest test

        They seem to pass:

        Testsuite: org.apache.hadoop.fs.s3.Jets3tS3FileSystemContractTest
        Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 131.575 sec
        Testsuite: org.apache.hadoop.fs.s3native.Jets3tNativeS3FileSystemContractTest
        Tests run: 26, Failures: 0, Errors: 0, Time elapsed: 52.694 sec

        However, they both produce hundreds of warnings:

        (s3) 2008-11-21 15:51:28,800 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(317)) - Response '/%2Ftest' - Unexpected response code 404, expected 200
        (s3n) 2008-11-21 15:34:55,646 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(317)) - Response '/test' - Unexpected response code 404, expected 200

        Any ideas?

        Show
        David Phillips added a comment - I ran the tests as follows after setting the correct test buckets and keys in src/test/hadoop-site.xml: ant -Dtestcase=Jets3tS3FileSystemContractTest test ant -Dtestcase=Jets3tNativeS3FileSystemContractTest test They seem to pass: Testsuite: org.apache.hadoop.fs.s3.Jets3tS3FileSystemContractTest Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 131.575 sec Testsuite: org.apache.hadoop.fs.s3native.Jets3tNativeS3FileSystemContractTest Tests run: 26, Failures: 0, Errors: 0, Time elapsed: 52.694 sec However, they both produce hundreds of warnings: (s3) 2008-11-21 15:51:28,800 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(317)) - Response '/%2Ftest' - Unexpected response code 404, expected 200 (s3n) 2008-11-21 15:34:55,646 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(317)) - Response '/test' - Unexpected response code 404, expected 200 Any ideas?
        Hide
        David Phillips added a comment -

        Bucket creation also removed from Jets3tFileSystemStore.

        Show
        David Phillips added a comment - Bucket creation also removed from Jets3tFileSystemStore.
        Hide
        Tom White added a comment -

        Cancelling patch pending change to Jets3tFileSystemStore.

        Show
        Tom White added a comment - Cancelling patch pending change to Jets3tFileSystemStore.
        Hide
        Tom White added a comment -

        For consistency, we should make the same change to Jets3tFileSystemStore.

        Also, regarding the tests, there are two unit tests: Jets3tS3FileSystemContractTest and Jets3tNativeS3FileSystemContractTest which can be run manually to test the S3 integration. The only difference with this patch is that the buckets they run against must already exist - so I don't think any change to the tests are needed.

        Show
        Tom White added a comment - For consistency, we should make the same change to Jets3tFileSystemStore. Also, regarding the tests, there are two unit tests: Jets3tS3FileSystemContractTest and Jets3tNativeS3FileSystemContractTest which can be run manually to test the S3 integration. The only difference with this patch is that the buckets they run against must already exist - so I don't think any change to the tests are needed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12392277/hadoop-s3n-nocreate.patch
        against trunk revision 705430.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12392277/hadoop-s3n-nocreate.patch against trunk revision 705430. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3482/console This message is automatically generated.
        Hide
        Tom White added a comment -

        Marked as an incompatible change, since existing code that relies on bucket creation will need to be changed.

        Show
        Tom White added a comment - Marked as an incompatible change, since existing code that relies on bucket creation will need to be changed.
        Hide
        David Phillips added a comment -

        Patch applies with -p0 now (used git diff --no-prefix).

        Show
        David Phillips added a comment - Patch applies with -p0 now (used git diff --no-prefix).
        Hide
        Doug Cutting added a comment -

        This patch needs to be re-generated without the a/ b/ stuff for Hudson to be able to apply it. It must apply with 'patch -p 0 < foo.patch' when connected to trunk.

        Show
        Doug Cutting added a comment - This patch needs to be re-generated without the a/ b/ stuff for Hudson to be able to apply it. It must apply with 'patch -p 0 < foo.patch' when connected to trunk.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12392205/hadoop-s3n-nocreate.patch
        against trunk revision 705073.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3469/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12392205/hadoop-s3n-nocreate.patch against trunk revision 705073. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3469/console This message is automatically generated.
        Hide
        David Phillips added a comment -

        Simple patch that removes bucket creation.

        Show
        David Phillips added a comment - Simple patch that removes bucket creation.

          People

          • Assignee:
            David Phillips
            Reporter:
            David Phillips
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development