Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: fs/s3
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem lets you bypass this restriction by uploading blocks, it is necessary for us to output our data into Amazon's publicdatasets bucket which is shared with others.

      Amazon has added a similar feature to their distribution of hadoop as has MapR.

      Please note that while this supports large copies, it does not yet support parallel copies because jets3t doesn't expose an API yet that allows it without hadoop controlling the threads unlike with upload.

      By default, this patch does not enable multipart uploads. To enable them and parallel uploads:

      add the following keys to your hadoop config:

      <property>
      <name>fs.s3n.multipart.uploads.enabled</name>
      <value>true</value>
      </property>
      <property>
      <name>fs.s3n.multipart.uploads.block.size</name>
      <value>67108864</value>
      </property>
      <property>
      <name>fs.s3n.multipart.copy.block.size</name>
      <value>5368709120</value>
      </property>

      create a /etc/hadoop/conf/jets3t.properties file with or similar to:

      storage-service.internal-error-retry-max=5
      storage-service.disable-live-md5=false
      threaded-service.max-thread-count=20
      threaded-service.admin-max-thread-count=20
      s3service.max-thread-count=20
      s3service.admin-max-thread-count=20

      1. HADOOP-9454-10.patch
        32 kB
        Jordan Mendelson
      2. HADOOP-9454-11.patch
        14 kB
        Akira Ajisaka
      3. HADOOP-9454-12.patch
        15 kB
        Akira Ajisaka

        Issue Links

          Activity

          Hide
          aloisius Jordan Mendelson added a comment -

          Here is a patch against trunk which adds multipart upload support. It also updates the jets3t library to 0.90 (based on a patch in HADOOP-8136).

          It is difficult to build automated tests against this as it requires a valid s3 access key in order to test writing to S3 buckets, however I verified that it does indeed allow you to upload more than 5 GB files (tested by uploading an 8 GB image image of my root filesystem, renaming it on S3 (requires a multipart upload copy), downloading it and comparing the md5sum) and continues to work as normal if fs.s3n.multipart.uploads.enabled is set to false and have run through various fs commands to verify that everything works as it should.

          This patch adds two config options: fs.s3n.multipart.uploads.enabled and fs.s3n.multipart.uploads.block.size. The former was named after the Amazon setting which does the same thing (defaults to false). The latter controls the minimum filesize and the block size before multipart file uploads are used (default, 64 MB).

          By default, jets3t will only spawn two threads to upload, but you can change this by setting the threaded-service.max-thread-count property in jets3t.properties file. I've tried with upwards of 20 threads and it is significantly faster.

          This patch should work with only minor changes with older versions of hadoop as well since the s3native and s3 filesystems haven't changed much. I originally wrote it for CDH 4.

          Please note, because of the way hadoop fs works, it requires a remote copy which for large files takes a while. This is because hadoop fs copies files with as filename.COPYING and then renames it. Unfortunately, there is no rename support on Amazon S3, so we must do a copy() then delete(). The copy() can take quite a while for large files. Also because of this, when multipart files is enabled, an additional request will be made to AWS when doing a copy to check if the source file is larger than 5 GB.

          Show
          aloisius Jordan Mendelson added a comment - Here is a patch against trunk which adds multipart upload support. It also updates the jets3t library to 0.90 (based on a patch in HADOOP-8136 ). It is difficult to build automated tests against this as it requires a valid s3 access key in order to test writing to S3 buckets, however I verified that it does indeed allow you to upload more than 5 GB files (tested by uploading an 8 GB image image of my root filesystem, renaming it on S3 (requires a multipart upload copy), downloading it and comparing the md5sum) and continues to work as normal if fs.s3n.multipart.uploads.enabled is set to false and have run through various fs commands to verify that everything works as it should. This patch adds two config options: fs.s3n.multipart.uploads.enabled and fs.s3n.multipart.uploads.block.size. The former was named after the Amazon setting which does the same thing (defaults to false). The latter controls the minimum filesize and the block size before multipart file uploads are used (default, 64 MB). By default, jets3t will only spawn two threads to upload, but you can change this by setting the threaded-service.max-thread-count property in jets3t.properties file. I've tried with upwards of 20 threads and it is significantly faster. This patch should work with only minor changes with older versions of hadoop as well since the s3native and s3 filesystems haven't changed much. I originally wrote it for CDH 4. Please note, because of the way hadoop fs works, it requires a remote copy which for large files takes a while. This is because hadoop fs copies files with as filename. COPYING and then renames it. Unfortunately, there is no rename support on Amazon S3, so we must do a copy() then delete(). The copy() can take quite a while for large files. Also because of this, when multipart files is enabled, an additional request will be made to AWS when doing a copy to check if the source file is larger than 5 GB.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12576927/HADOOP-9454-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576927/HADOOP-9454-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2410//console This message is automatically generated.
          Hide
          aloisius Jordan Mendelson added a comment -

          Woops, remove that unused variable.

          Show
          aloisius Jordan Mendelson added a comment - Woops, remove that unused variable.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12576935/HADOOP-9454-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576935/HADOOP-9454-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2411//console This message is automatically generated.
          Hide
          aloisius Jordan Mendelson added a comment -

          I doubt we're going to get those warnings down without a more major overhaul. They are due to the fact that many of the interfaces used in both the s3 and s3native filesystems were deprecated in 0.8.0. They still work, but they should be swapped out for the newer more generic interface.

          Show
          aloisius Jordan Mendelson added a comment - I doubt we're going to get those warnings down without a more major overhaul. They are due to the fact that many of the interfaces used in both the s3 and s3native filesystems were deprecated in 0.8.0. They still work, but they should be swapped out for the newer more generic interface.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          core changes look good, and this upgrades to the latest jets3t, superceding HADOOP-8136.

          What it lacks is any test of multipart upload.

          I would recommend lifting [https://github.com/hortonworks/Hadoop-and-Swift-integration/blob/converged/swift-file-system/src/test/java/org/apache/hadoop/fs/swift/TestSwiftFileSystemPartitionedUploads.java] and using it as the basis for a test. It probes the output stream for the number of parts uploaded, and verifies that the result of the bulk upload matches the original

          Show
          stevel@apache.org Steve Loughran added a comment - core changes look good, and this upgrades to the latest jets3t, superceding HADOOP-8136 . What it lacks is any test of multipart upload. I would recommend lifting [ https://github.com/hortonworks/Hadoop-and-Swift-integration/blob/converged/swift-file-system/src/test/java/org/apache/hadoop/fs/swift/TestSwiftFileSystemPartitionedUploads.java ] and using it as the basis for a test. It probes the output stream for the number of parts uploaded, and verifies that the result of the bulk upload matches the original
          Hide
          apurtell Andrew Purtell added a comment -

          +1 upgrading jets3t is good for several reasons and this provides a nice benefit.

          Show
          apurtell Andrew Purtell added a comment - +1 upgrading jets3t is good for several reasons and this provides a nice benefit.
          Hide
          aloisius Jordan Mendelson added a comment -

          This version includes a test that uploads three files of various sizes, renames them then downloads them to compare the hashes. It tests both normal and multipart uploads and multipart copies.

          It will not run unless your test core-site.xml file has valid aws credentials and the test.fs.s3n.name is filled out properly (it'll just skip the tests). Also, since the only way to test the multipart copy is to upload a 5 GB file, it will take quite a while to actually run this on a non-network optimized instance (the test runner seems to end up killing it if it takes over ~10 minutes).

          I've included a test jets3t.properties which increases the thread count for uploading so it can do so in a reasonable amount of time. Downloading actually takes significantly longer than multipart uploading (which we can fix with parallel downloading possibly in the future?).

          Show
          aloisius Jordan Mendelson added a comment - This version includes a test that uploads three files of various sizes, renames them then downloads them to compare the hashes. It tests both normal and multipart uploads and multipart copies. It will not run unless your test core-site.xml file has valid aws credentials and the test.fs.s3n.name is filled out properly (it'll just skip the tests). Also, since the only way to test the multipart copy is to upload a 5 GB file, it will take quite a while to actually run this on a non-network optimized instance (the test runner seems to end up killing it if it takes over ~10 minutes). I've included a test jets3t.properties which increases the thread count for uploading so it can do so in a reasonable amount of time. Downloading actually takes significantly longer than multipart uploading (which we can fix with parallel downloading possibly in the future?).
          Hide
          aloisius Jordan Mendelson added a comment -

          I suppose parallel download is unnecessary considering Hadoop will happily split the input itself. It is just for that one test it would come in handy to speed it up.

          Show
          aloisius Jordan Mendelson added a comment - I suppose parallel download is unnecessary considering Hadoop will happily split the input itself. It is just for that one test it would come in handy to speed it up.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577139/HADOOP-9454-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577139/HADOOP-9454-3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2416//console This message is automatically generated.
          Hide
          aloisius Jordan Mendelson added a comment -

          Add license information to config file.

          Show
          aloisius Jordan Mendelson added a comment - Add license information to config file.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577146/HADOOP-9454-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577146/HADOOP-9454-4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2417//console This message is automatically generated.
          Hide
          aloisius Jordan Mendelson added a comment -

          Move away from deprecated methods to reduce warnings.

          Show
          aloisius Jordan Mendelson added a comment - Move away from deprecated methods to reduce warnings.
          Hide
          aloisius Jordan Mendelson added a comment -

          Woops, include all the new files.

          Show
          aloisius Jordan Mendelson added a comment - Woops, include all the new files.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583707/HADOOP-9454-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          -1 javac. The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583707/HADOOP-9454-5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1380 javac compiler warnings (more than the trunk's current 1367 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2547//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583713/HADOOP-9454-9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2549//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2549//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583713/HADOOP-9454-9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2549//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2549//console This message is automatically generated.
          Hide
          aloisius Jordan Mendelson added a comment -

          Fix a bug that would cause very very slow calls to mkdir.

          Show
          aloisius Jordan Mendelson added a comment - Fix a bug that would cause very very slow calls to mkdir.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12588273/HADOOP-9454-10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The applied patch generated 1154 javac compiler warnings (more than the trunk's current 1152 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588273/HADOOP-9454-10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1154 javac compiler warnings (more than the trunk's current 1152 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2662//console This message is automatically generated.
          Hide
          joecrobak Joe Crobak added a comment -

          Is this patch abandoned? Or is it not being accepted? I'd love to see this feature added.

          Show
          joecrobak Joe Crobak added a comment - Is this patch abandoned? Or is it not being accepted? I'd love to see this feature added.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12588273/HADOOP-9454-10.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3544//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588273/HADOOP-9454-10.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3544//console This message is automatically generated.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          I want this feature, too. Rebased the patch for the latest trunk.

          Show
          ajisakaa Akira Ajisaka added a comment - I want this feature, too. Rebased the patch for the latest trunk.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Added new properties to core-default.xml.

          Show
          ajisakaa Akira Ajisaka added a comment - Added new properties to core-default.xml.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12628895/HADOOP-9454-12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3576//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3576//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628895/HADOOP-9454-12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3576//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3576//console This message is automatically generated.
          Hide
          tarnfeld Tom Arnfeld added a comment -

          Could you tell me which version of Hadoop this will be going into? I'm still using the old mapred API (not hadoop 2) and am incredibly keen to use this.

          Does this begin uploading multipart chunks as soon as hadoop begins to spill? This was one of the biggest gains I saw on EMR (and from understanding their implementation of S3 multipart uploads) and can quite considerably increase job run time for large outputs.

          Show
          tarnfeld Tom Arnfeld added a comment - Could you tell me which version of Hadoop this will be going into? I'm still using the old mapred API (not hadoop 2) and am incredibly keen to use this. Does this begin uploading multipart chunks as soon as hadoop begins to spill? This was one of the biggest gains I saw on EMR (and from understanding their implementation of S3 multipart uploads) and can quite considerably increase job run time for large outputs.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          The patch is for trunk, so it's not for hadoop 1.x release.
          If the patch is reviewed and committed in trunk, I'm willing to create a patch for branch-1.

          Committers, would you please review the latest patch?

          Show
          ajisakaa Akira Ajisaka added a comment - The patch is for trunk, so it's not for hadoop 1.x release. If the patch is reviewed and committed in trunk, I'm willing to create a patch for branch-1. Committers, would you please review the latest patch?
          Hide
          aloisius Jordan Mendelson added a comment -

          I've abandoned working on this patch. It works, but it is a rather big pain for me to patch and deploy a copy of hadoop so I wrote a separate s3 filesystem that uses the aws sdk instead of jets3t which I use with CDH 4. It should probably work with other versions as well: https://github.com/Aloisius/hadoop-s3a

          Show
          aloisius Jordan Mendelson added a comment - I've abandoned working on this patch. It works, but it is a rather big pain for me to patch and deploy a copy of hadoop so I wrote a separate s3 filesystem that uses the aws sdk instead of jets3t which I use with CDH 4. It should probably work with other versions as well: https://github.com/Aloisius/hadoop-s3a
          Hide
          atm Aaron T. Myers added a comment - - edited

          Akira / Jordan - thanks a lot for working on this. In the abstract I'm happy to check this change into Hadoop, but I don't consider myself especially well qualified to review this change since I'm not super familiar with S3/jets3t. I've asked Amandeep Khurana to take a look at it, since he's been involved with some of the more recent work around upgrading jets3t, etc. and if it looks good to him, I'll check it in, in which case it'll likely first show up in Hadoop 2.4.0.

          Jordan - would you have any interest in contributing your S3A implementation to Hadoop? Is it Apache licensed? If so, we should file a new JIRA to get that checked in alongside the existing S3 and S3N FS implementations.

          Show
          atm Aaron T. Myers added a comment - - edited Akira / Jordan - thanks a lot for working on this. In the abstract I'm happy to check this change into Hadoop, but I don't consider myself especially well qualified to review this change since I'm not super familiar with S3/jets3t. I've asked Amandeep Khurana to take a look at it, since he's been involved with some of the more recent work around upgrading jets3t, etc. and if it looks good to him, I'll check it in, in which case it'll likely first show up in Hadoop 2.4.0. Jordan - would you have any interest in contributing your S3A implementation to Hadoop? Is it Apache licensed? If so, we should file a new JIRA to get that checked in alongside the existing S3 and S3N FS implementations.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          we should file a new JIRA to get that checked in alongside the existing S3 and S3N FS implementations

          Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix.

          Show
          ndimiduk Nick Dimiduk added a comment - we should file a new JIRA to get that checked in alongside the existing S3 and S3N FS implementations Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix.
          Hide
          atm Aaron T. Myers added a comment -

          Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix.

          I wouldn't necessarily be opposed to that, but with an eye toward introducing an AWS SDK-based FS in a way that ensures there are no regressions vs. the existing S3 and S3N file systems, my preference would be to check in a net new implementation and deprecate the existing ones.

          Show
          atm Aaron T. Myers added a comment - Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix. I wouldn't necessarily be opposed to that, but with an eye toward introducing an AWS SDK-based FS in a way that ensures there are no regressions vs. the existing S3 and S3N file systems, my preference would be to check in a net new implementation and deprecate the existing ones.
          Hide
          amansk Amandeep Khurana added a comment -

          Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix.

          Yes, absolutely. If Jordan Mendelson submits a patch, we should commit it and have it be an option in parallel to the current s3n option and deprecate s3 and s3n implementations in due course.

          I just reviewed this patch and it looks good to me. The only thing is that this does not parallelize movement of large files using MR. So, multiple mappers can't upload different parts of a large file. Also, I don't know for a fact that it's possible to split a file into multiple parts and have individual mappers do the uploads with the current implementation, and if it is, it's not without significant changes to this patch.

          Having said that, I think this patch can be put in and we can open another jira for enhancements.

          Show
          amansk Amandeep Khurana added a comment - Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix. Yes, absolutely. If Jordan Mendelson submits a patch, we should commit it and have it be an option in parallel to the current s3n option and deprecate s3 and s3n implementations in due course. I just reviewed this patch and it looks good to me. The only thing is that this does not parallelize movement of large files using MR. So, multiple mappers can't upload different parts of a large file. Also, I don't know for a fact that it's possible to split a file into multiple parts and have individual mappers do the uploads with the current implementation, and if it is, it's not without significant changes to this patch. Having said that, I think this patch can be put in and we can open another jira for enhancements.
          Hide
          atm Aaron T. Myers added a comment -

          That all sounds good to me.

          +1, I'm going to commit this momentarily.

          Show
          atm Aaron T. Myers added a comment - That all sounds good to me. +1, I'm going to commit this momentarily.
          Hide
          atm Aaron T. Myers added a comment -

          I've just committed this to trunk, branch-2, and branch-2.4.

          Thanks a lot for the contribution, Jordan and Akira.

          Show
          atm Aaron T. Myers added a comment - I've just committed this to trunk, branch-2, and branch-2.4. Thanks a lot for the contribution, Jordan and Akira.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5233 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5233/)
          HADOOP-9454. Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5233 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5233/ ) HADOOP-9454 . Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Woo hoo!

          Show
          ndimiduk Nick Dimiduk added a comment - Woo hoo!
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Thanks Amandeep Khurana for reviewing, and Aaron T. Myers for committing!

          Show
          ajisakaa Akira Ajisaka added a comment - Thanks Amandeep Khurana for reviewing, and Aaron T. Myers for committing!
          Hide
          aloisius Jordan Mendelson added a comment -

          The S3A implementation I wrote is Apache licensed. It has a few benefits over the patch including parallel copy (& rename) support, not requiring those _$folder$ files (it uses filename/ which is what the Amazon web interface uses as well) and allowing multiple buffer output directories (which are required to write the file out to before upload so we can get its MD5 which AWS requires). I've been using it in production for several months.

          The only reason I switched to the AWS SDK was to get abortable HTTP calls since Hadoop likes to seek around things like sequence files when searching for split points which causes all sorts of performance problems if you can't abort your HTTP call (the current s3n implementation iirc will just read down the entire request before closing the connection and reopening it).

          Show
          aloisius Jordan Mendelson added a comment - The S3A implementation I wrote is Apache licensed. It has a few benefits over the patch including parallel copy (& rename) support, not requiring those _$folder$ files (it uses filename/ which is what the Amazon web interface uses as well) and allowing multiple buffer output directories (which are required to write the file out to before upload so we can get its MD5 which AWS requires). I've been using it in production for several months. The only reason I switched to the AWS SDK was to get abortable HTTP calls since Hadoop likes to seek around things like sequence files when searching for split points which causes all sorts of performance problems if you can't abort your HTTP call (the current s3n implementation iirc will just read down the entire request before closing the connection and reopening it).
          Hide
          atm Aaron T. Myers added a comment -

          Jordan Mendelson - thanks for the explanation. If you'd be up for it, mind filing another HADOOP JIRA to contribute the S3A implementation? If so, link it here and I'll be sure to follow it and help with reviewing/committing it.

          Show
          atm Aaron T. Myers added a comment - Jordan Mendelson - thanks for the explanation. If you'd be up for it, mind filing another HADOOP JIRA to contribute the S3A implementation? If so, link it here and I'll be sure to follow it and help with reviewing/committing it.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I'll rebase my HADOOP-9361 patch on this and see if the extra tests throw up any surprises. I think I'd like to move the in-line string constants out into statics strings for less brittleness (That's an issue all round the hadoop codebase)

          Show
          stevel@apache.org Steve Loughran added a comment - I'll rebase my HADOOP-9361 patch on this and see if the extra tests throw up any surprises. I think I'd like to move the in-line string constants out into statics strings for less brittleness (That's an issue all round the hadoop codebase)
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #494 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/494/)
          HADOOP-9454. Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #494 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/494/ ) HADOOP-9454 . Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1686 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1686/)
          HADOOP-9454. Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1686 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1686/ ) HADOOP-9454 . Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1711 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1711/)
          HADOOP-9454. Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1711 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1711/ ) HADOOP-9454 . Support multipart uploads for s3native. Contributed by Jordan Mendelson and Akira AJISAKA. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1572235 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/TestJets3tNativeFileSystemStore.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/resources/jets3t.properties
          Hide
          tarnfeld Tom Arnfeld added a comment -

          Wow! Awesome, just come back to this thread. Akira Ajisaka you mentioned you'd be up for patching to Hadoop 1 – have you had a chance to look at that yet? I guess an alternative for me would be to simply use S3a (directly from https://github.com/Aloisius/hadoop-s3a). Currently on CDH3 so moving to CDH4 shouldn't be an issue.

          Show
          tarnfeld Tom Arnfeld added a comment - Wow! Awesome, just come back to this thread. Akira Ajisaka you mentioned you'd be up for patching to Hadoop 1 – have you had a chance to look at that yet? I guess an alternative for me would be to simply use S3a (directly from https://github.com/Aloisius/hadoop-s3a ). Currently on CDH3 so moving to CDH4 shouldn't be an issue.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Tom Arnfeld, I looked around branch-1 code and found it would take a long time for me to create a backport patch.
          If you are satisfied with using S3a implementation, I'd like to lower the priority. I'll continue to work if you really want it.

          Developers, anyone can take over this backporting issue.

          Show
          ajisakaa Akira Ajisaka added a comment - Tom Arnfeld , I looked around branch-1 code and found it would take a long time for me to create a backport patch. If you are satisfied with using S3a implementation, I'd like to lower the priority. I'll continue to work if you really want it. Developers, anyone can take over this backporting issue.
          Hide
          aloisius Jordan Mendelson added a comment -

          Aaron T. Myers, I had a go at patching trunk for the s3a filesystem at HADOOP-10400. Maven and I don't get along, so I basically just searched out every place that referenced jets3t to add the aws sdk. Hopefully it worked ok.

          Show
          aloisius Jordan Mendelson added a comment - Aaron T. Myers, I had a go at patching trunk for the s3a filesystem at HADOOP-10400 . Maven and I don't get along, so I basically just searched out every place that referenced jets3t to add the aws sdk. Hopefully it worked ok.
          Hide
          aloisius Jordan Mendelson added a comment -

          Woops. Don't know how to link names properly. HADOOP-10400 should have a patch that incorporates s3a if Tom Arnfeld, Aaron T. Myers or Amandeep Khurana want to take a look. Also the github repo now uses maven to build so it should be a bit easier either way.

          Show
          aloisius Jordan Mendelson added a comment - Woops. Don't know how to link names properly. HADOOP-10400 should have a patch that incorporates s3a if Tom Arnfeld , Aaron T. Myers or Amandeep Khurana want to take a look. Also the github repo now uses maven to build so it should be a bit easier either way.
          Hide
          atm Aaron T. Myers added a comment -

          Great, thanks a lot, Jordan. I'll take a look over on HADOOP-10400.

          Show
          atm Aaron T. Myers added a comment - Great, thanks a lot, Jordan. I'll take a look over on HADOOP-10400 .
          Hide
          raviprak Ravi Prakash added a comment -

          Hi! Could someone please tell me how to run the test which was added as part of this JIRA? Do what I may, its being skipped

          $ mvn -Dtest=TestJets3tNativeFileSystemStore test
          ......
          Running org.apache.hadoop.fs.s3native.TestJets3tNativeFileSystemStore
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.207 sec
          
          Show
          raviprak Ravi Prakash added a comment - Hi! Could someone please tell me how to run the test which was added as part of this JIRA? Do what I may, its being skipped $ mvn -Dtest=TestJets3tNativeFileSystemStore test ...... Running org.apache.hadoop.fs.s3native.TestJets3tNativeFileSystemStore Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.207 sec
          Hide
          stevel@apache.org Steve Loughran added a comment -

          you need to set the properties checked for in checkSettings() in your test/resources/conf-site.xml

              Configuration conf = new Configuration();
              assumeNotNull(conf.get("fs.s3n.awsAccessKeyId"));
              assumeNotNull(conf.get("fs.s3n.awsSecretAccessKey"));
              assumeNotNull(conf.get("test.fs.s3n.name"));
          

          Don't check the secrets in...

          Show
          stevel@apache.org Steve Loughran added a comment - you need to set the properties checked for in checkSettings() in your test/resources/conf-site.xml Configuration conf = new Configuration(); assumeNotNull(conf.get( "fs.s3n.awsAccessKeyId" )); assumeNotNull(conf.get( "fs.s3n.awsSecretAccessKey" )); assumeNotNull(conf.get( "test.fs.s3n.name" )); Don't check the secrets in...

            People

            • Assignee:
              ajisakaa Akira Ajisaka
              Reporter:
              aloisius Jordan Mendelson
            • Votes:
              4 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development