Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Currently s3a buffers files on disk(s) before uploading. This JIRA investigates adding a memory-based upload implementation.

      The motivation is evidently performance: this would be beneficial for users with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on an S3-compatible object store (FYI: my contributions are made in name of Amplidata).

      1. HADOOP-11183.001.patch
        25 kB
        Thomas Demoor
      2. HADOOP-11183.002.patch
        26 kB
        Thomas Demoor
      3. HADOOP-11183.003.patch
        22 kB
        Thomas Demoor
      4. design-comments.pdf
        44 kB
        Thomas Demoor
      5. HADOOP-11183-004.patch
        22 kB
        Thomas Demoor
      6. HADOOP-11183-005.patch
        26 kB
        Thomas Demoor
      7. HADOOP-11183-006.patch
        28 kB
        Thomas Demoor
      8. HADOOP-11183-007.patch
        29 kB
        Thomas Demoor
      9. HADOOP-11183-008.patch
        30 kB
        Steve Loughran
      10. HADOOP-11183-009.patch
        30 kB
        Thomas Demoor
      11. HADOOP-11183-010.patch
        30 kB
        Thomas Demoor

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          Hide
          thodemoor Thomas Demoor added a comment -

          Thanks, Steve.

          Show
          thodemoor Thomas Demoor added a comment - Thanks, Steve.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7247 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7247/)
          HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)

          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java
          • hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
          • hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7247 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7247/ ) HADOOP-11183 . Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +1

          • Tested locally; 6 MB upload actually took "5.504" seconds, which is surprisingly fast, but I couldn't see any evidence that it hadn't happened, and as a regular one to 1.7s, it scaled proportionally to the 6x size increase.
          • saw an intermittent failure of TestS3AFileSystemContract.testRenameRootDirForbidden() which is something seen before, and unrelated to this patch.
          • docs clearer now
          • javadoc and javadoc warnings appear unrelated
          • core test failure is the intermittent ZK one again.

          applying, Thanks!

          Show
          stevel@apache.org Steve Loughran added a comment - +1 Tested locally; 6 MB upload actually took "5.504" seconds, which is surprisingly fast, but I couldn't see any evidence that it hadn't happened, and as a regular one to 1.7s, it scaled proportionally to the 6x size increase. saw an intermittent failure of TestS3AFileSystemContract.testRenameRootDirForbidden() which is something seen before, and unrelated to this patch. docs clearer now javadoc and javadoc warnings appear unrelated core test failure is the intermittent ZK one again. applying, Thanks!
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702232/HADOOP-11183-010.patch
          against trunk revision e17e5ba.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          -1 javac. The applied patch generated 1236 javac compiler warnings (more than the trunk's current 1199 warnings).

          -1 javadoc. The javadoc tool appears to have generated 25 warning messages.
          See https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavadocWarnings.txt for details.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws:

          org.apache.hadoop.ha.TestZKFailoverControllerStress

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702232/HADOOP-11183-010.patch against trunk revision e17e5ba. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1236 javac compiler warnings (more than the trunk's current 1199 warnings). -1 javadoc . The javadoc tool appears to have generated 25 warning messages. See https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5835//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          Rebased vs trunk

          Show
          thodemoor Thomas Demoor added a comment - Rebased vs trunk
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702156/HADOOP-11183-009.patch
          against trunk revision 4228de9.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5831//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702156/HADOOP-11183-009.patch against trunk revision 4228de9. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5831//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          Spurious indeed. Clarified threshold and partsize settings.

          Show
          thodemoor Thomas Demoor added a comment - Spurious indeed. Clarified threshold and partsize settings.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702087/HADOOP-11183-008.patch
          against trunk revision d1c6acc.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702087/HADOOP-11183-008.patch against trunk revision d1c6acc. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5824//console This message is automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          The patch didn't apply as the (spurious?) changes to the fs contract base test were incompatible; the 008 patch fixes that by removing that file from the diff and has some formatting & typo fixes to the documentation. I'll do a test run when I next have bandwidth; from the code review itself it looks good.

          One issue is that I didn't fully understand the bit in the docs regarding "the maximum size of a memory buffer ", mapping the properties to the

          The maximum size of a memory buffer is `fs.s3a.multipart.threshold` / `fs.s3a.multipart.size` for an upload / partupload respectively.

          What does that mean?

          Show
          stevel@apache.org Steve Loughran added a comment - The patch didn't apply as the (spurious?) changes to the fs contract base test were incompatible; the 008 patch fixes that by removing that file from the diff and has some formatting & typo fixes to the documentation. I'll do a test run when I next have bandwidth; from the code review itself it looks good. One issue is that I didn't fully understand the bit in the docs regarding "the maximum size of a memory buffer ", mapping the properties to the The maximum size of a memory buffer is `fs.s3a.multipart.threshold` / `fs.s3a.multipart.size` for an upload / partupload respectively. What does that mean?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12701903/HADOOP-11183-007.patch
          against trunk revision ca1c00b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws:

          org.apache.hadoop.ipc.TestRPCWaitForProxy

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701903/HADOOP-11183-007.patch against trunk revision ca1c00b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws: org.apache.hadoop.ipc.TestRPCWaitForProxy Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5813//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          TestClass now forces fast upload to be true. Tests a regular 1MB upload and a 5+1MB multi-part upload.

          Show
          thodemoor Thomas Demoor added a comment - TestClass now forces fast upload to be true. Tests a regular 1MB upload and a 5+1MB multi-part upload.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          For a test, yes, both output streams. You're adding new production code here

          Show
          stevel@apache.org Steve Loughran added a comment - For a test, yes, both output streams. You're adding new production code here
          Hide
          thodemoor Thomas Demoor added a comment -

          Thanks Steve.

          I agree on the 6MB multipart. Fast upload is not used by default. You still have to enable it in core-site.xml FastOutputStream does both single uploads and multipart. I added this test just to have a small multipartupload (rather than that commented out large multipart test somewhere) to test both outpustream implementations. Would you like me to force fast.upload to true?

          Show
          thodemoor Thomas Demoor added a comment - Thanks Steve. I agree on the 6MB multipart. Fast upload is not used by default. You still have to enable it in core-site.xml FastOutputStream does both single uploads and multipart. I added this test just to have a small multipartupload (rather than that commented out large multipart test somewhere) to test both outpustream implementations. Would you like me to force fast.upload to true?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          looking at the comments, I see that 5MB is the minimum.

          How about a test that writes, say 6MB?

          Show
          stevel@apache.org Steve Loughran added a comment - looking at the comments, I see that 5MB is the minimum. How about a test that writes, say 6MB?
          Hide
          stevel@apache.org Steve Loughran added a comment -
          1. does TestS3ASmallMultiPartUpload actually enable the new feature?
          2. a 40 MB upload/download can still take time on a slow link. Could we make this smaller, so the whole upload size is less than, say 1 MB?
          Show
          stevel@apache.org Steve Loughran added a comment - does TestS3ASmallMultiPartUpload actually enable the new feature? a 40 MB upload/download can still take time on a slow link. Could we make this smaller, so the whole upload size is less than, say 1 MB?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          OK, doing a test run on this. Like you say, it's harmless unless enabled, and the docs now make that clear.

          The main risk is taking on the obligation to maintain it for an indefinite period. However, on the basis that we could always return the classic output stream, cut this code and simply print a warning if the flag is set, that's a low cost obligation.

          Show
          stevel@apache.org Steve Loughran added a comment - OK, doing a test run on this. Like you say, it's harmless unless enabled, and the docs now make that clear. The main risk is taking on the obligation to maintain it for an indefinite period. However, on the basis that we could always return the classic output stream, cut this code and simply print a warning if the flag is set, that's a low cost obligation.
          Hide
          thodemoor Thomas Demoor added a comment -

          Addendum to my previous comment: to be more clear on the flush(): minimum size of 5MB is for a partUpload (of a on going multi-part upload). Evidently, one can write smaller objects through a single put.

          The reason I think it's safe to put this in 2.7 as unstable is that the codepath is never touched when the config flag is set to false (default). It's a drop in replacement for S3AOutputStream, but is only used if the user opts in.

          Show
          thodemoor Thomas Demoor added a comment - Addendum to my previous comment: to be more clear on the flush(): minimum size of 5MB is for a partUpload (of a on going multi-part upload). Evidently, one can write smaller objects through a single put. The reason I think it's safe to put this in 2.7 as unstable is that the codepath is never touched when the config flag is set to false (default). It's a drop in replacement for S3AOutputStream, but is only used if the user opts in.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12700121/HADOOP-11183-006.patch
          against trunk revision fe7a302.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700121/HADOOP-11183-006.patch against trunk revision fe7a302. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5757//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          I ran with the mark as unstable suggestion and implemented your suggestions.

          write()
          2. If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it.

          No, the exector's linkedblockingqueue blocks writes once max.total.tasks are waiting.

          Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems.

          AWS has a minimum upload size of 5MB, so if the buffer is currently smaller .... I can think of a workaround but it's messy.

          I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.

          I agree we should add these over time. Counting async operations, is non-trivial however.

          Show
          thodemoor Thomas Demoor added a comment - I ran with the mark as unstable suggestion and implemented your suggestions. write() 2. If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it. No, the exector's linkedblockingqueue blocks writes once max.total.tasks are waiting. Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems. AWS has a minimum upload size of 5MB, so if the buffer is currently smaller .... I can think of a workaround but it's messy. I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results. I agree we should add these over time. Counting async operations, is non-trivial however.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I have done a code review, but not actually used this yet. I'm worried it's being looked at a bit late for the 2.7 branch, but I can't fault anyone but myself and other reviewers for that.

          If we get it in, I'd like the markdown docs to highlight "experimental, unstable, use at own risk" for this option. Then 2.7 can be viewed as stabilisation of it, with goal being trusted by 2.8 & that warning removed. Perhaps a special subsection of s3a, "experiment fast output stream" and some specifics, rather than just listing a config flag, "fast", which everyone is going to turn on based on its name alone.

          constructor

          1. check for and reject negative buffer size. Maybe have a check & Warn/reject too small a buffer?
          2. S3AFastOutputStream constructor should log @debug, not info

          write()

          1. would benefit from some comments explaining the two branches, e.g. "copy into buffer" and "buffer full, write it" + "recursively write remainder"
          2. If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it.

          close()

          I'd swap the operations in the finally clause, so no matter what the superclass does, this close operation always finishes. Better yet, set close() earlier.

          uploadBuffer()

          make private

          putObject()

          InterruptedException shouldn't be swallowed here, it hides problems and can cause process shutdowns to hang. Either convert to InterruptedIOException, or set interrupted flag on the current thread again.

          waitForAllPartUploads()

          Need a policy on interrupts here too. Imagine that communications with a remote S3 server are blocked, something has just sent a kill -3 to the client app, and is now waiting for it to finish cleanly...

          ProgressableListener

          can me made private

          Other.

          1. Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems.
          2. I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.
          Show
          stevel@apache.org Steve Loughran added a comment - I have done a code review, but not actually used this yet. I'm worried it's being looked at a bit late for the 2.7 branch, but I can't fault anyone but myself and other reviewers for that. If we get it in, I'd like the markdown docs to highlight "experimental, unstable, use at own risk" for this option. Then 2.7 can be viewed as stabilisation of it, with goal being trusted by 2.8 & that warning removed. Perhaps a special subsection of s3a, "experiment fast output stream" and some specifics, rather than just listing a config flag, "fast", which everyone is going to turn on based on its name alone. constructor check for and reject negative buffer size. Maybe have a check & Warn/reject too small a buffer? S3AFastOutputStream constructor should log @debug, not info write() would benefit from some comments explaining the two branches, e.g. "copy into buffer" and "buffer full, write it" + "recursively write remainder" If I created a 1-byte buffer, would that tail recursion trigger a stack overflow? If so, some checks in ctor may be good, such as just hard coded minimum & upping low values to it. close() I'd swap the operations in the finally clause, so no matter what the superclass does, this close operation always finishes. Better yet, set close() earlier. uploadBuffer() make private putObject() InterruptedException shouldn't be swallowed here, it hides problems and can cause process shutdowns to hang. Either convert to InterruptedIOException, or set interrupted flag on the current thread again. waitForAllPartUploads() Need a policy on interrupts here too. Imagine that communications with a remote S3 server are blocked, something has just sent a kill -3 to the client app, and is now waiting for it to finish cleanly... ProgressableListener can me made private Other. Can we have flush() do a partial write? Would that gain anything in durability terms? At the very least, it may make timing of writes more consistent with other filesystems. I can think of some more tests here. If S3AFastOutputStream added a counter of #of uploads, a test could grab that stream and verify that writes triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, etc. Also be interesting to time & compare classic vs fast operations & print that in the test results.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699876/HADOOP-11183-005.patch
          against trunk revision 6f01330.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699876/HADOOP-11183-005.patch against trunk revision 6f01330. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5751//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          Marked as unstable.

          The underlying httpclient retries retriable errors and you can control it though fs.s3a.attempts.maximum

          Did some more investigation and the dominant time it takes for failure is the dns failing to resolve. After that the subsequent parts fail fast in line with what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and have implemented) seems premature optimization. Have been testing the current code for some time so I think we shouldn't take the risk to put fail-fast in so close to 2.7, I'll open up a separate jira for fail-fast.

          Added site and core-default documentation. While passing by I corrected the description of the connection timeouts: they are defined in milliseconds, not seconds.

          Show
          thodemoor Thomas Demoor added a comment - Marked as unstable. The underlying httpclient retries retriable errors and you can control it though fs.s3a.attempts.maximum Did some more investigation and the dominant time it takes for failure is the dns failing to resolve. After that the subsequent parts fail fast in line with what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and have implemented) seems premature optimization. Have been testing the current code for some time so I think we shouldn't take the risk to put fail-fast in so close to 2.7, I'll open up a separate jira for fail-fast. Added site and core-default documentation. While passing by I corrected the description of the connection timeouts: they are defined in milliseconds, not seconds.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Exceptions

          1. we're going to need them translated into IOExceptions
          2. I'd like typed for the basic FileNotFound exception & permissions, though not so typed that they lose data

          Statistics.

          I don't have a good opinion here. For the swift stuff I didn't do enough integration with Hadoop stats; I did collect counters on the various HTTP verbs, including average time, which helped track down some throttling of DELETE.

          We can evolve statistics, especially if you highlight "unstable".

          Failures

          Fail-fast would be my preference, unless there's some attempt to retry on a transient outage. That might be a good option on some IOEs, (connection &c), but not others (Authentication failures)

          Show
          stevel@apache.org Steve Loughran added a comment - Exceptions we're going to need them translated into IOExceptions I'd like typed for the basic FileNotFound exception & permissions, though not so typed that they lose data Statistics. I don't have a good opinion here. For the swift stuff I didn't do enough integration with Hadoop stats; I did collect counters on the various HTTP verbs, including average time, which helped track down some throttling of DELETE. We can evolve statistics, especially if you highlight "unstable". Failures Fail-fast would be my preference, unless there's some attempt to retry on a transient outage. That might be a good option on some IOEs, (connection &c), but not others (Authentication failures)
          Hide
          thodemoor Thomas Demoor added a comment -

          I have some open questions:

          Exceptions
          Server side exceptions are thrown as AmazonServiceException (which extends AmazonClientException) returns quite some info on toString():

          public String getMessage() {
                  return "Status Code: " + getStatusCode() + ", "
                      + "AWS Service: " + getServiceName() + ", "
                      + "AWS Request ID: " + getRequestId() + ", "
                      + "AWS Error Code: " + getErrorCode() + ", "
                      + "AWS Error Message: " + super.getMessage();
              }
          

          The error codes are detailed here: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

          A more general AmazonClientException is thrown if the server could not be reached (or there is another client side problem). Do you want me to wrap the entire list in the link above in standard Java exceptions? I agree that typed exceptions are good as they provide more info, but considering the detailed response codes, this might not be the top priority.

          Statistics

          Unlike hdfs, s3a passes null to the wrapping FSDataOutputStream and does the statistics counting itself. It counts the bytes transferred to the server (double counting retries, etc.) by adding listeners (from AWS lib) to the uploads. It also calls statistics.incrementWriteOps(1) on every part of a multipart upload. It thus gives an S3-centric view of the filesystem stats, not a Hadoop one.

          The introduced S3AFastOutputStream leaves bytecounting to FSDataOutputStream (cfr. hdfs) and only counts a writeOp per successful create(). It thus has different behavior. Should I revert to the S3AOutputStream way or will we change that to be HDFS like (in a separate jira)?

          Failures

          Currently, failure of a MultiPartUpload is only checked upon closing the file. So f.i. if the server is unreachable each part waits for the connection setup timeout to fail, which takes a while. Once one part has failed, we should abort asap. I think adding a callback to each partUpload(ListenableFuture) that sets an AtomicBoolean failed = true if it has failed and checking this before starting a partUpload allows us to throw the Exception at the start of the next partUpload.

          Show
          thodemoor Thomas Demoor added a comment - I have some open questions: Exceptions Server side exceptions are thrown as AmazonServiceException (which extends AmazonClientException) returns quite some info on toString(): public String getMessage() { return "Status Code: " + getStatusCode() + ", " + "AWS Service: " + getServiceName() + ", " + "AWS Request ID: " + getRequestId() + ", " + "AWS Error Code: " + getErrorCode() + ", " + "AWS Error Message: " + super .getMessage(); } The error codes are detailed here: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html A more general AmazonClientException is thrown if the server could not be reached (or there is another client side problem). Do you want me to wrap the entire list in the link above in standard Java exceptions? I agree that typed exceptions are good as they provide more info, but considering the detailed response codes, this might not be the top priority. Statistics Unlike hdfs, s3a passes null to the wrapping FSDataOutputStream and does the statistics counting itself. It counts the bytes transferred to the server (double counting retries, etc.) by adding listeners (from AWS lib) to the uploads. It also calls statistics.incrementWriteOps(1) on every part of a multipart upload. It thus gives an S3-centric view of the filesystem stats, not a Hadoop one. The introduced S3AFastOutputStream leaves bytecounting to FSDataOutputStream (cfr. hdfs) and only counts a writeOp per successful create(). It thus has different behavior. Should I revert to the S3AOutputStream way or will we change that to be HDFS like (in a separate jira)? Failures Currently, failure of a MultiPartUpload is only checked upon closing the file. So f.i. if the server is unreachable each part waits for the connection setup timeout to fail, which takes a while. Once one part has failed, we should abort asap. I think adding a callback to each partUpload(ListenableFuture) that sets an AtomicBoolean failed = true if it has failed and checking this before starting a partUpload allows us to throw the Exception at the start of the next partUpload.
          Hide
          thodemoor Thomas Demoor added a comment -

          004.patch: a rebase of 003.patch vs trunk. Thanks, Steve Loughran.

          Show
          thodemoor Thomas Demoor added a comment - 004.patch: a rebase of 003.patch vs trunk. Thanks, Steve Loughran .
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Thomas, if you can submit this in sync with trunk I'll take a look

          Show
          stevel@apache.org Steve Loughran added a comment - Thomas, if you can submit this in sync with trunk I'll take a look
          Hide
          thodemoor Thomas Demoor added a comment -

          The info.md markdown file (now pdf) contained design remarks (i.e., why can't this use TransferManager cfr. the rest of the code) not general documentation for the project. I have refactored (the paragraph on performance was indeed not of interest to the general community.) and renamed it to design-comments.pdf to make the intent clearer.

          Show
          thodemoor Thomas Demoor added a comment - The info.md markdown file (now pdf) contained design remarks (i.e., why can't this use TransferManager cfr. the rest of the code) not general documentation for the project. I have refactored (the paragraph on performance was indeed not of interest to the general community.) and renamed it to design-comments.pdf to make the intent clearer.
          Hide
          stevel@apache.org Steve Loughran added a comment -
          1. markdown docs should just go into src/site/markdown, with relevant links, as part of the overall patch.
          2. please don't say bad things about HDFS performance, especially given ongoing work with erasure coding, TCP stack bypassing for local writes, etc etc. Best to leave all performance numbers out altogether.
          3. consider that within an EC2 VM, memory storage should be faster than virtualized HDD, SSD may change the values, etc. etc.
          4. ...so just say "may offer performance improvements, especially on operations with the object store "close" to the writer, where "close" is defined by network bandwidth and latency"
          Show
          stevel@apache.org Steve Loughran added a comment - markdown docs should just go into src/site/markdown, with relevant links, as part of the overall patch. please don't say bad things about HDFS performance, especially given ongoing work with erasure coding, TCP stack bypassing for local writes, etc etc. Best to leave all performance numbers out altogether. consider that within an EC2 VM, memory storage should be faster than virtualized HDD, SSD may change the values, etc. etc. ...so just say "may offer performance improvements, especially on operations with the object store "close" to the writer, where "close" is defined by network bandwidth and latency"
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12695552/HADOOP-11183.003.patch
          against trunk revision ac8d52b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-aws.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695552/HADOOP-11183.003.patch against trunk revision ac8d52b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//artifact/patchprocess/newPatchFindbugsWarningshadoop-aws.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5669//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12698411/info-003.md
          against trunk revision ac8d52b.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5670//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698411/info-003.md against trunk revision ac8d52b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5670//console This message is automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I don't know enough about the AWS library to comment on that aspect; others will have to.

          One thing I would like to see (And which presumably could be applied to other bits of the S3a code), is better translation of AWS exceptions into common IOE subclasses. Eg. auth exception, file already exists exception, ....

          There's enough places where things are being caught and wrapped you could have a generic static IOException convertException(AmazonClientException method somewhere to do the conversion. IOEs on their own are fairly uninformative.

          Also: when wrapping exceptions, always include the{{toString()}} value of the nested exception in the extended text. That way, even if the nested exceptions get lost or stack traces don't get printed, the underlying problem is there for someone to look at.

          Show
          stevel@apache.org Steve Loughran added a comment - I don't know enough about the AWS library to comment on that aspect; others will have to. One thing I would like to see (And which presumably could be applied to other bits of the S3a code), is better translation of AWS exceptions into common IOE subclasses. Eg. auth exception, file already exists exception, .... There's enough places where things are being caught and wrapped you could have a generic static IOException convertException(AmazonClientException method somewhere to do the conversion. IOEs on their own are fairly uninformative. Also: when wrapping exceptions, always include the{{toString()}} value of the nested exception in the extended text. That way, even if the nested exceptions get lost or stack traces don't get printed, the underlying problem is there for someone to look at.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12695553/info-003.patch.md
          against trunk revision ac8d52b.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5668//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695553/info-003.patch.md against trunk revision ac8d52b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5668//console This message is automatically generated.
          Hide
          thodemoor Thomas Demoor added a comment -

          Failures currently waits for all parts to fail, each typically taking up to a connection timeout. This should be improved but I'd like some feedback on the happy path first.

          Show
          thodemoor Thomas Demoor added a comment - Failures currently waits for all parts to fail, each typically taking up to a connection timeout. This should be improved but I'd like some feedback on the happy path first.
          Hide
          thodemoor Thomas Demoor added a comment -

          003.patch:

          • concurrent uploading of parallel uploads / partUploads (of the same or different multipartUploads) in different threads
          • controllable max memory footprint through threadpoolexecutor w/ blocking queue (causes dependency on HADOOP-11463)
          • More info in: info-003.patch.md
          • Speedup: my 10 MB/s connection from work upload of a 128MB file is 20% faster to AWS Ireland. Bigger files and higher throughput increase the speedup: directly on Amplidata's object store the difference is huge.
          • Documentation is being added in HADOOP-11522

          Difference with 002.patch:

          • HDFS-like statistics counting: I prefer this, and you? See previous comment for more info.

          Remark 2 and 3 from my comment on 001.patch are still open.

          Show
          thodemoor Thomas Demoor added a comment - 003.patch: concurrent uploading of parallel uploads / partUploads (of the same or different multipartUploads) in different threads controllable max memory footprint through threadpoolexecutor w/ blocking queue (causes dependency on HADOOP-11463 ) More info in: info-003.patch.md Speedup: my 10 MB/s connection from work upload of a 128MB file is 20% faster to AWS Ireland. Bigger files and higher throughput increase the speedup: directly on Amplidata's object store the difference is huge. Documentation is being added in HADOOP-11522 Difference with 002.patch: HDFS-like statistics counting: I prefer this, and you? See previous comment for more info. Remark 2 and 3 from my comment on 001.patch are still open.
          Hide
          thodemoor Thomas Demoor added a comment -

          002.patch: see 003.patch for basic functionality but this patch uses statistics collection similar to S3AOutputStream, which is different from what HDFS does. It counts bytes at the server side, thus double counting retried bytes and counts each partUpload of a MultiPartUpload as a write operation.

          (a related note: The swift fs code does not even seem to do write operation counting)

          Show
          thodemoor Thomas Demoor added a comment - 002.patch: see 003.patch for basic functionality but this patch uses statistics collection similar to S3AOutputStream, which is different from what HDFS does. It counts bytes at the server side, thus double counting retried bytes and counts each partUpload of a MultiPartUpload as a write operation. (a related note: The swift fs code does not even seem to do write operation counting)
          Hide
          thodemoor Thomas Demoor added a comment -

          Patch 001: synchronous implementation (blocks on every partUpload). A little extra info is provided in info-S3AFastOutputStream-sync.md.

          Additional remarks:

          1. This patch is simply to set the stage and kick off the discussion. I am working on an async version (multiple concurrent partuploads), which I will post asap.
          2. I would really like to bump up the aws-sdk version but in some other jira this was said to be a gargantuan task (probably http versions conflicting with other libraries? azure?) Correct?
          3. I also renamed partSizeThreshold to the correct term multiPartThreshold. (creating a separate issue seemed overkill).
          Show
          thodemoor Thomas Demoor added a comment - Patch 001: synchronous implementation (blocks on every partUpload). A little extra info is provided in info-S3AFastOutputStream-sync.md. Additional remarks: This patch is simply to set the stage and kick off the discussion. I am working on an async version (multiple concurrent partuploads), which I will post asap. I would really like to bump up the aws-sdk version but in some other jira this was said to be a gargantuan task (probably http versions conflicting with other libraries? azure?) Correct? I also renamed partSizeThreshold to the correct term multiPartThreshold. (creating a separate issue seemed overkill).
          Hide
          thodemoor Thomas Demoor added a comment -

          Overview

          Indeed, I do not immediately see any data integrity regressions. The current file-based implementation does not leverage the durability provided by the disks.

          The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS).

          One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the low-level AWS API instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves.

          A rough algorithm

          Write commands append to the in memory "buffer" (ByteArrayOutputStream?)
          If close is called while buffer.size < partSizeThreshold do a regular upload (using TransferManager?)
          Else a write causes buffer.size >= partSizeThreshold:

          • initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize
          • subsequent writes fill up the buffer, when partSize is exceeded: transfer a part
          • close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload

          DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former.

          Memory usage

          Maximum amount of used memory = (number of open files) x max(partSizeThreshold, partSize)
          The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one expect to be open at the same time for different sensible (f.i. not HBase) use cases? We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them.

          I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion.

          Show
          thodemoor Thomas Demoor added a comment - Overview Indeed, I do not immediately see any data integrity regressions. The current file-based implementation does not leverage the durability provided by the disks. The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS). One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the low-level AWS API instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves. A rough algorithm Write commands append to the in memory "buffer" (ByteArrayOutputStream?) If close is called while buffer.size < partSizeThreshold do a regular upload (using TransferManager?) Else a write causes buffer.size >= partSizeThreshold: initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize subsequent writes fill up the buffer, when partSize is exceeded: transfer a part close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former. Memory usage Maximum amount of used memory = (number of open files) x max(partSizeThreshold, partSize) The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one expect to be open at the same time for different sensible (f.i. not HBase) use cases? We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them. I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          given that the cached-to-hdd data is lost if the client fails, there's no enhanced risk to data integrity here —it even improved cleanup.

          It will require a JVM set up with many GB of buffer space though -correct? Or does the AWS API add an append operation for incremental writes?

          Show
          stevel@apache.org Steve Loughran added a comment - given that the cached-to-hdd data is lost if the client fails, there's no enhanced risk to data integrity here —it even improved cleanup. It will require a JVM set up with many GB of buffer space though -correct? Or does the AWS API add an append operation for incremental writes?

            People

            • Assignee:
              thodemoor Thomas Demoor
              Reporter:
              thodemoor Thomas Demoor
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development