Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14520

WASB: Block compaction for Azure Block Blobs

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.4
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: fs/azure
    • Labels:
      None
    • Release Note:
      Block Compaction for Azure Block Blobs. When the number of blocks in a block blob is above 32000, the process of compaction replaces a sequence of small blocks with with one big block.

      Description

      Block Compaction for WASB allows uploading new blocks for every hflush/hsync call. When the number of blocks is above 32000, next hflush/hsync triggers the block compaction process. Block compaction replaces a sequence of blocks with one block. From all the sequences with total length less than 4M, compaction chooses the longest one. It is a greedy algorithm that preserve all potential candidates for the next round. Block Compaction for WASB increases data durability and allows using block blobs instead of page blobs. By default, block compaction is disabled. Similar to the configuration for page blobs, the client needs to specify HDFS folders where block compaction over block blobs is enabled.

      Results for HADOOP_14520_07.patch
      tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
      Tests run: 777, Failures: 0, Errors: 0, Skipped: 155

      1. HADOOP_14520_07.patch
        87 kB
        Georgi Chalakov
      2. HADOOP_14520_08.patch
        88 kB
        Georgi Chalakov
      3. HADOOP_14520_09.patch
        88 kB
        Georgi Chalakov
      4. HADOOP_14520_10.patch
        91 kB
        Georgi Chalakov
      5. HADOOP-14520-006.patch
        93 kB
        Thomas Marquardt
      6. HADOOP-14520-008.patch
        90 kB
        Steve Loughran
      7. HADOOP-14520-009.patch
        92 kB
        Steve Loughran
      8. HADOOP-14520-05.patch
        104 kB
        Georgi Chalakov
      9. hadoop-14520-branch-2-010.patch
        92 kB
        Georgi Chalakov
      10. HADOOP-14520-patch-07-08.diff
        19 kB
        Steve Loughran
      11. HADOOP-14520-patch-07-09.diff
        25 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          jnp Jitendra Nath Pandey added a comment -

          Fix version 2.0.6-alpha didn't look right, changed it to 2.9. Steve Loughran

          Show
          jnp Jitendra Nath Pandey added a comment - Fix version 2.0.6-alpha didn't look right, changed it to 2.9. Steve Loughran
          Hide
          Georgi Georgi Chalakov added a comment -

          Thanks for the review Steve!

          Show
          Georgi Georgi Chalakov added a comment - Thanks for the review Steve!
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +1 for branch-2 patch

          committed to branch-2, except for the big logging @ debug in testing. That shouldn't be in either branch, not until needed. Maybe its time to review the azure/test/resources log4j file & comment out those logging levels that successful test runs don't need

          Show
          stevel@apache.org Steve Loughran added a comment - +1 for branch-2 patch committed to branch-2, except for the big logging @ debug in testing. That shouldn't be in either branch, not until needed. Maybe its time to review the azure/test/resources log4j file & comment out those logging levels that successful test runs don't need
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                branch-2 Compile Tests
          +1 mvninstall 7m 13s branch-2 passed
          +1 compile 0m 21s branch-2 passed
          +1 checkstyle 0m 16s branch-2 passed
          +1 mvnsite 0m 25s branch-2 passed
          +1 findbugs 0m 37s branch-2 passed
          +1 javadoc 0m 16s branch-2 passed
                Patch Compile Tests
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 19s the patch passed
          -1 javac 0m 19s hadoop-tools_hadoop-azure generated 3 new + 5 unchanged - 0 fixed = 8 total (was 5)
          +1 checkstyle 0m 13s the patch passed
          +1 mvnsite 0m 23s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 47s the patch passed
          +1 javadoc 0m 13s the patch passed
                Other Tests
          +1 unit 2m 19s hadoop-azure in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          15m 43s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:eaf5c66
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886163/hadoop-14520-branch-2-010.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 302bf5eab3e0 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / 1421196
          Default Java 1.7.0_131
          findbugs v3.0.0
          javac https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       branch-2 Compile Tests +1 mvninstall 7m 13s branch-2 passed +1 compile 0m 21s branch-2 passed +1 checkstyle 0m 16s branch-2 passed +1 mvnsite 0m 25s branch-2 passed +1 findbugs 0m 37s branch-2 passed +1 javadoc 0m 16s branch-2 passed       Patch Compile Tests +1 mvninstall 0m 22s the patch passed +1 compile 0m 19s the patch passed -1 javac 0m 19s hadoop-tools_hadoop-azure generated 3 new + 5 unchanged - 0 fixed = 8 total (was 5) +1 checkstyle 0m 13s the patch passed +1 mvnsite 0m 23s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 47s the patch passed +1 javadoc 0m 13s the patch passed       Other Tests +1 unit 2m 19s hadoop-azure in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 15m 43s Subsystem Report/Notes Docker Image:yetus/hadoop:eaf5c66 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886163/hadoop-14520-branch-2-010.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 302bf5eab3e0 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / 1421196 Default Java 1.7.0_131 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13214/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Georgi Georgi Chalakov added a comment -

          Thanks for the review Steve!

          I have attached the patch for branch-2: hadoop-14520-branch-2-010.patch
          Results from endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
          Tests run: 774, Failures: 0, Errors: 0, Skipped: 131

          Show
          Georgi Georgi Chalakov added a comment - Thanks for the review Steve! I have attached the patch for branch-2: hadoop-14520-branch-2-010.patch Results from endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net Tests run: 774, Failures: 0, Errors: 0, Skipped: 131
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12811 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12811/)
          HADOOP-14520. WASB: Block compaction for Azure Block Blobs. Contributed (stevel: rev 13eda5000304099d1145631f9be13ce8a00b600d)

          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java
          • (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestNativeAzureFileSystemBlockCompaction.java
          • (edit) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestAzureConcurrentOutOfBandIo.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SecureStorageInterfaceImpl.java
          • (edit) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/MockStorageInterface.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeFileSystemStore.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/AzureNativeFileSystemStore.java
          • (edit) hadoop-tools/hadoop-azure/src/test/resources/log4j.properties
          • (edit) hadoop-tools/hadoop-azure/src/site/markdown/index.md
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SyncableDataOutputStream.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobAppendStream.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterfaceImpl.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterface.java
          • (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SelfRenewingLease.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12811 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12811/ ) HADOOP-14520 . WASB: Block compaction for Azure Block Blobs. Contributed (stevel: rev 13eda5000304099d1145631f9be13ce8a00b600d) (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestNativeAzureFileSystemBlockCompaction.java (edit) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestAzureConcurrentOutOfBandIo.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SecureStorageInterfaceImpl.java (edit) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/MockStorageInterface.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeFileSystemStore.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/AzureNativeFileSystemStore.java (edit) hadoop-tools/hadoop-azure/src/test/resources/log4j.properties (edit) hadoop-tools/hadoop-azure/src/site/markdown/index.md (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SyncableDataOutputStream.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobAppendStream.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterfaceImpl.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterface.java (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/SelfRenewingLease.java
          Hide
          stevel@apache.org Steve Loughran added a comment -

          LGTM
          +1

          committed to trunk & branch-3. If you want to backport to branch-2 , apply the patch to branch-2, run the azure tests, and if all is well, attach the patch to this JIRA with a name like hadoop-14520-branch-2-010.patch & once submitted, the patch will be tested against branch 2 instead

          Show
          stevel@apache.org Steve Loughran added a comment - LGTM +1 committed to trunk & branch-3. If you want to backport to branch-2 , apply the patch to branch-2, run the azure tests, and if all is well, attach the patch to this JIRA with a name like hadoop-14520-branch-2-010.patch & once submitted, the patch will be tested against branch 2 instead
          Hide
          stevel@apache.org Steve Loughran added a comment -

          FWIW, moving to Call<> would be the best story, especially as it lines up for Java 8. But it would be a complicated change on what is a big enough patch as it is

          Show
          stevel@apache.org Steve Loughran added a comment - FWIW, moving to Call<> would be the best story, especially as it lines up for Java 8. But it would be a complicated change on what is a big enough patch as it is
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 15m 4s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 14s trunk passed
          +1 mvnsite 0m 21s trunk passed
          +1 findbugs 0m 29s trunk passed
          +1 javadoc 0m 14s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 18s the patch passed
          +1 javac 0m 18s the patch passed
          +1 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 0 new + 84 unchanged - 2 fixed = 84 total (was 86)
          +1 mvnsite 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 54s the patch passed
          +1 javadoc 0m 12s the patch passed
                Other Tests
          +1 unit 2m 8s hadoop-azure in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          23m 29s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:71bbb86
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884813/HADOOP_14520_10.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 215e83c15b7f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bac4e8c
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13149/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13149/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 15m 4s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 21s trunk passed +1 findbugs 0m 29s trunk passed +1 javadoc 0m 14s trunk passed       Patch Compile Tests +1 mvninstall 0m 20s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed +1 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 0 new + 84 unchanged - 2 fixed = 84 total (was 86) +1 mvnsite 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 54s the patch passed +1 javadoc 0m 12s the patch passed       Other Tests +1 unit 2m 8s hadoop-azure in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 23m 29s Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884813/HADOOP_14520_10.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 215e83c15b7f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bac4e8c Default Java 1.8.0_144 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13149/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13149/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Georgi Georgi Chalakov added a comment - - edited

          Thank you for adding all these fixes. Stream capabilities looks like an useful feature.

          Re:flush()
          FSDataOutputStream doesn't overwrite flush() and a normal flush() call on application level would not execute BlockBlobAppendStream::flush(). When the compaction is disabled hflush/hsync are nop and the performance of BlockBlobAppendStream for all operations is the same (or better) than before.

          Re:more than one append stream
          We take a lease on the blob, that means at any point of time you can have one append stream only. If we had more than one append stream opened at the same time, we couldn't guarantee the order of write operations.

          I have added hsync() call and made isclosed volatile.

          Re:close()
          I think the first exception is the best indication what went wrong. After an exception, close() is just best effort. I don't know how useful for a client would be to continue after IO related exception, but if that is necessary, the client can continue. If block compaction is enabled, the client can go and read all the data until the last successful hflush()/hsync(). When the block compaction is disabled, we grantee nothing. We may or may not have the data stored in the service.

          Show
          Georgi Georgi Chalakov added a comment - - edited Thank you for adding all these fixes. Stream capabilities looks like an useful feature. Re:flush() FSDataOutputStream doesn't overwrite flush() and a normal flush() call on application level would not execute BlockBlobAppendStream::flush(). When the compaction is disabled hflush/hsync are nop and the performance of BlockBlobAppendStream for all operations is the same (or better) than before. Re:more than one append stream We take a lease on the blob, that means at any point of time you can have one append stream only. If we had more than one append stream opened at the same time, we couldn't guarantee the order of write operations. I have added hsync() call and made isclosed volatile. Re:close() I think the first exception is the best indication what went wrong. After an exception, close() is just best effort. I don't know how useful for a client would be to continue after IO related exception, but if that is necessary, the client can continue. If block compaction is enabled, the client can go and read all the data until the last successful hflush()/hsync(). When the block compaction is disabled, we grantee nothing. We may or may not have the data stored in the service.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 14m 7s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 22s trunk passed
          +1 findbugs 0m 31s trunk passed
          +1 javadoc 0m 15s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 18s the patch passed
          +1 javac 0m 18s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-azure: The patch generated 11 new + 84 unchanged - 2 fixed = 95 total (was 86)
          +1 mvnsite 0m 18s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 0m 38s the patch passed
          +1 javadoc 0m 12s the patch passed
                Other Tests
          +1 unit 2m 5s hadoop-azure in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          21m 45s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:71bbb86
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884788/HADOOP-14520-009.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 58361e6c6d2a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d4417da
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/artifact/patchprocess/whitespace-eol.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 7s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 22s trunk passed +1 findbugs 0m 31s trunk passed +1 javadoc 0m 15s trunk passed       Patch Compile Tests +1 mvninstall 0m 20s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-azure: The patch generated 11 new + 84 unchanged - 2 fixed = 95 total (was 86) +1 mvnsite 0m 18s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 0m 38s the patch passed +1 javadoc 0m 12s the patch passed       Other Tests +1 unit 2m 5s hadoop-azure in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 21m 45s Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884788/HADOOP-14520-009.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 58361e6c6d2a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d4417da Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13148/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Patch 009

          • fixed checkstyle complaints in production code
          • fixed checkstyle complaints in test src where appropriate
          • noticed that NativeAzureFsOutputStream.close() didn't guarantee that out==null at the end. fixed
          • tweaked imports in BlockBlobAppendStream so that new com.microsoft imports are alongside existing ones
          • IDE warning of some codepaths in the test cases may not set the variables tested. Unlikely, but I added the asserts
            to fail meaningfully on this.

          Tested: TestNativeAzureFileSystemBlockCompaction against mock FS.

          I note we don't have any functional tests... TestNativeAzureFileSystemAppend could do this, either as is
          or with a subclass which tweaks the config before running the test; it'd need some hsync() calls in the various operations too, just to help force that codepath to work.

          Actually, that raises an interesting question. What if there is >1 append stream open on same/different host, and they both decide to do a compaction. The API lets them do this, right?

          Trying to acquire a lease for a compaction & failing shouldn't propagate up, it could just be swallowed & tried later.

          Show
          stevel@apache.org Steve Loughran added a comment - Patch 009 fixed checkstyle complaints in production code fixed checkstyle complaints in test src where appropriate noticed that NativeAzureFsOutputStream.close() didn't guarantee that out==null at the end. fixed tweaked imports in BlockBlobAppendStream so that new com.microsoft imports are alongside existing ones IDE warning of some codepaths in the test cases may not set the variables tested. Unlikely, but I added the asserts to fail meaningfully on this. Tested: TestNativeAzureFileSystemBlockCompaction against mock FS. I note we don't have any functional tests... TestNativeAzureFileSystemAppend could do this, either as is or with a subclass which tweaks the config before running the test; it'd need some hsync() calls in the various operations too, just to help force that codepath to work. Actually, that raises an interesting question. What if there is >1 append stream open on same/different host, and they both decide to do a compaction. The API lets them do this, right? Trying to acquire a lease for a compaction & failing shouldn't propagate up, it could just be swallowed & tried later.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          patch 07-09 is diff between patch 07 and the 09 patch

          Show
          stevel@apache.org Steve Loughran added a comment - patch 07-09 is diff between patch 07 and the 09 patch
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 13m 58s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 17s trunk passed
          +1 mvnsite 0m 22s trunk passed
          +1 findbugs 0m 30s trunk passed
          +1 javadoc 0m 15s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 16s the patch passed
          +1 javac 0m 16s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 22 new + 84 unchanged - 2 fixed = 106 total (was 86)
          +1 mvnsite 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 33s the patch passed
          -1 javadoc 0m 11s hadoop-azure in the patch failed.
                Other Tests
          +1 unit 2m 2s hadoop-azure in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          21m 23s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:71bbb86
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884768/HADOOP-14520-008.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c983acaae13e 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d4417da
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 58s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 22s trunk passed +1 findbugs 0m 30s trunk passed +1 javadoc 0m 15s trunk passed       Patch Compile Tests +1 mvninstall 0m 18s the patch passed +1 compile 0m 16s the patch passed +1 javac 0m 16s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 22 new + 84 unchanged - 2 fixed = 106 total (was 86) +1 mvnsite 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 33s the patch passed -1 javadoc 0m 11s hadoop-azure in the patch failed.       Other Tests +1 unit 2m 2s hadoop-azure in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 21m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884768/HADOOP-14520-008.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c983acaae13e 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d4417da Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13147/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Review of Patch 007 (which comes with some code diffs attached in the form of patch 008)

          I actually reviewed this by applying it to the IDE and looking @ the diffs there; doing the build. This means that it was easy enough to do some of the minor tweaks while I was reading it, with the IDE telling me what to do, and step around what's going on more. I think understand what's going on more now, so now I'm worrying about things like close() and the semantics of the flush operations.

          BlockBlobAppendStream

          L490: flush() isn't synced, but is checking non-volatile closed field set in close() and elsewhere. Either closed should be volatile (easiest), do things with AtomicBoolean. As discussed. you don't want flush to be blocking writes. (More on flush later)

          L162 mark UploadCommand as abstract (fixed)
          L207: javadocs say "last exception", but given how it's set, it's not quite is it? (fixed)
          L223 javadoc: duplicate "block" (fixed)
          L455 replace is null test with Preconditions.checkArgument() (done)
          L595 add whitespace around "==" (done)
          L599, L606, L640, L834 use isEmpty() (done)
          L363, L705, L:751 add whitespace when building up text (done)

          I've factored the setting of firstError into maybeSetFirstError(); throwing it in maybeThrowFirstError, which
          notes whether or not it has been raised.

          BlockBlobAppendStream.close() robustness

          I think close() needs a bit of work to be somewhat more robust to things going wrong.

          1. with my patch, it only throws firstError if its not already hit a problem. So, say, if a write fails, you catch it and call close(), it doesn't bother rethrowing any followon exceptions. Is that the right thing to do? Because while it wont rethrow the previous exception, it means that any new exception raised during lease.free() isn't going to be raised
          1. if an exception gets raised in close() before closed is set, operations like {[write()}} and flush can still be attempted. Indeed, you can even try calling close() again. This can explicitly happen if there's a timeout awaiting the thread pool to terminate, and it could also happen if there's some code error.

          What could be done here?

          1. I think the closed field should go to AtomicBoolean, the first bit of close() is to atomically get & set it & return if close had already taken place.
          1. Maybe, when close() is entered, if the firstException has already been thrown, reset the field & the flag, so that any new exception raised during close is then picked up and thrown.

          This is a complicated enough issue I'm not changing it myself, I think we need to look at what the possibilities of failure are and how best to report them while doing a best effort shut down.

          StreamCapabilities

          In HADOOP-13327 I'm slowly trying to fix the hadoop codebase so that all streams which support Syncable support the interface
          StreamCapabilities, where they can dynamically tell callers what their sync policy really is.

          See this document

          I've added what I think the stream capabilities should be for this new stream: hflush and hsync are no-ops without compactionEnabled set. But are hsync() and hflush() doing what they are meant to? Because with the current patch, flush() is a fairly expensive call (it's uploading the block) on whether or not compaction is enabled, and hsync and hflush are no-ops when compaction is disabled.

          What is the behaviour you want? As flush() can get called a lot in code, it's in OutputStream and there's no requirements in implementations to offer any guarantees about visibility and durability. And if compaction is disabled, every time flush() is called it'll soon upload too many entries. And as its doing this even with compaction is off, you are in trouble.

          Proposed:

          • flush() is a no-op.
          • hsync() writes up/compacts data compactionEnabled is set.
          • hflush() calls hsync()
          • hasCapabilities declares hsync and hflush support only

          This way those apps which really expect hflush to do a full (slow) flush, and hsync to do an even slower durable write get what they expect, calls of flush() get no guarantees of anything, which is consistent with what OpenStream.flush() promises.

          Other

          • SelfRenewingLease: static import of constant string, keeps line length down.
          • only need to use single quotes in `code` entries in .md file (done). That was me being confusing in the previous feedback, sorry.

          Testing: tested TestNativeAzureFileSystemBlockCompaction against azure ireland. Note that the hasCapabilities() code merits some test coverage, ideally including some hsync/hflush.

          Show
          stevel@apache.org Steve Loughran added a comment - Review of Patch 007 (which comes with some code diffs attached in the form of patch 008) I actually reviewed this by applying it to the IDE and looking @ the diffs there; doing the build. This means that it was easy enough to do some of the minor tweaks while I was reading it, with the IDE telling me what to do, and step around what's going on more. I think understand what's going on more now, so now I'm worrying about things like close() and the semantics of the flush operations. BlockBlobAppendStream L490: flush() isn't synced, but is checking non-volatile closed field set in close() and elsewhere. Either closed should be volatile (easiest), do things with AtomicBoolean. As discussed. you don't want flush to be blocking writes. (More on flush later) L162 mark UploadCommand as abstract (fixed) L207: javadocs say "last exception", but given how it's set, it's not quite is it? (fixed) L223 javadoc: duplicate "block" (fixed) L455 replace is null test with Preconditions.checkArgument() (done) L595 add whitespace around "==" (done) L599, L606, L640, L834 use isEmpty() (done) L363, L705, L:751 add whitespace when building up text (done) I've factored the setting of firstError into maybeSetFirstError() ; throwing it in maybeThrowFirstError , which notes whether or not it has been raised. BlockBlobAppendStream.close() robustness I think close() needs a bit of work to be somewhat more robust to things going wrong. with my patch, it only throws firstError if its not already hit a problem. So, say, if a write fails, you catch it and call close() , it doesn't bother rethrowing any followon exceptions. Is that the right thing to do? Because while it wont rethrow the previous exception, it means that any new exception raised during lease.free() isn't going to be raised if an exception gets raised in close() before closed is set, operations like {[write()}} and flush can still be attempted. Indeed, you can even try calling close() again. This can explicitly happen if there's a timeout awaiting the thread pool to terminate, and it could also happen if there's some code error. What could be done here? I think the closed field should go to AtomicBoolean , the first bit of close() is to atomically get & set it & return if close had already taken place. Maybe, when close() is entered, if the firstException has already been thrown, reset the field & the flag, so that any new exception raised during close is then picked up and thrown. This is a complicated enough issue I'm not changing it myself, I think we need to look at what the possibilities of failure are and how best to report them while doing a best effort shut down . StreamCapabilities In HADOOP-13327 I'm slowly trying to fix the hadoop codebase so that all streams which support Syncable support the interface StreamCapabilities , where they can dynamically tell callers what their sync policy really is. See this document I've added what I think the stream capabilities should be for this new stream: hflush and hsync are no-ops without compactionEnabled set. But are hsync() and hflush() doing what they are meant to? Because with the current patch, flush() is a fairly expensive call (it's uploading the block) on whether or not compaction is enabled, and hsync and hflush are no-ops when compaction is disabled. What is the behaviour you want? As flush() can get called a lot in code, it's in OutputStream and there's no requirements in implementations to offer any guarantees about visibility and durability. And if compaction is disabled, every time flush() is called it'll soon upload too many entries. And as its doing this even with compaction is off, you are in trouble. Proposed: flush() is a no-op. hsync() writes up/compacts data compactionEnabled is set. hflush() calls hsync() hasCapabilities declares hsync and hflush support only This way those apps which really expect hflush to do a full (slow) flush, and hsync to do an even slower durable write get what they expect, calls of flush() get no guarantees of anything, which is consistent with what OpenStream.flush() promises. Other SelfRenewingLease : static import of constant string, keeps line length down. only need to use single quotes in `code` entries in .md file (done). That was me being confusing in the previous feedback, sorry. Testing: tested TestNativeAzureFileSystemBlockCompaction against azure ireland. Note that the hasCapabilities() code merits some test coverage, ideally including some hsync/hflush.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          first attachment: patch-07-08.diff, purely my changes between the trunk + the 07 patch (applied with whitespace=fix) and those I've done.

          Show
          stevel@apache.org Steve Loughran added a comment - first attachment: patch-07-08.diff, purely my changes between the trunk + the 07 patch (applied with whitespace=fix) and those I've done.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I'm reviewing this, it's just taking a while as I need to be thorough

          Show
          stevel@apache.org Steve Loughran added a comment - I'm reviewing this, it's just taking a while as I need to be thorough
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 15m 38s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 21s trunk passed
          +1 findbugs 0m 30s trunk passed
          +1 javadoc 0m 14s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 20s the patch passed
          +1 javac 0m 20s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 1 new + 84 unchanged - 2 fixed = 85 total (was 86)
          +1 mvnsite 0m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 40s the patch passed
          +1 javadoc 0m 14s the patch passed
                Other Tests
          +1 unit 2m 11s hadoop-azure in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          23m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884545/HADOOP_14520_09.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1001f0af9345 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4148023
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 15m 38s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 21s trunk passed +1 findbugs 0m 30s trunk passed +1 javadoc 0m 14s trunk passed       Patch Compile Tests +1 mvninstall 0m 22s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 1 new + 84 unchanged - 2 fixed = 85 total (was 86) +1 mvnsite 0m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 40s the patch passed +1 javadoc 0m 14s the patch passed       Other Tests +1 unit 2m 11s hadoop-azure in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 23m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884545/HADOOP_14520_09.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1001f0af9345 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4148023 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13140/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Georgi Georgi Chalakov added a comment -

          fixes javadoc issues.

          Show
          Georgi Georgi Chalakov added a comment - fixes javadoc issues.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 14m 22s trunk passed
          +1 compile 0m 22s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 22s trunk passed
          +1 findbugs 0m 32s trunk passed
          +1 javadoc 0m 15s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 20s the patch passed
          +1 compile 0m 18s the patch passed
          +1 javac 0m 18s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 1 new + 84 unchanged - 2 fixed = 85 total (was 86)
          +1 mvnsite 0m 21s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 33s the patch passed
          -1 javadoc 0m 12s hadoop-azure in the patch failed.
                Other Tests
          +1 unit 2m 10s hadoop-azure in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          22m 6s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884518/HADOOP_14520_08.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7c84b180781f 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fd66a24
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 22s trunk passed +1 compile 0m 22s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 22s trunk passed +1 findbugs 0m 32s trunk passed +1 javadoc 0m 15s trunk passed       Patch Compile Tests +1 mvninstall 0m 20s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 1 new + 84 unchanged - 2 fixed = 85 total (was 86) +1 mvnsite 0m 21s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 33s the patch passed -1 javadoc 0m 12s hadoop-azure in the patch failed.       Other Tests +1 unit 2m 10s hadoop-azure in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 22m 6s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884518/HADOOP_14520_08.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7c84b180781f 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fd66a24 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13138/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Georgi Georgi Chalakov added a comment -

          HADOOP_14520_08.patch
          whitespace fixes; javadoc fixes.

          Show
          Georgi Georgi Chalakov added a comment - HADOOP_14520_08.patch whitespace fixes; javadoc fixes.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 30s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 31m 38s trunk passed
          +1 compile 1m 17s trunk passed
          +1 checkstyle 0m 35s trunk passed
          +1 mvnsite 1m 7s trunk passed
          +1 findbugs 1m 25s trunk passed
          +1 javadoc 0m 43s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 52s the patch passed
          +1 compile 0m 52s the patch passed
          +1 javac 0m 52s the patch passed
          -0 checkstyle 0m 31s hadoop-tools/hadoop-azure: The patch generated 20 new + 84 unchanged - 2 fixed = 104 total (was 86)
          +1 mvnsite 0m 54s the patch passed
          -1 whitespace 0m 0s The patch has 13 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 1m 40s the patch passed
          -1 javadoc 0m 33s hadoop-azure in the patch failed.
                Other Tests
          +1 unit 4m 8s hadoop-azure in the patch passed.
          +1 asflicense 0m 38s The patch does not generate ASF License warnings.
          50m 5s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884368/HADOOP_14520_07.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e29c4a79b947 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 26fafc3
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/whitespace-eol.txt
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 30s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 31m 38s trunk passed +1 compile 1m 17s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 1m 7s trunk passed +1 findbugs 1m 25s trunk passed +1 javadoc 0m 43s trunk passed       Patch Compile Tests +1 mvninstall 0m 52s the patch passed +1 compile 0m 52s the patch passed +1 javac 0m 52s the patch passed -0 checkstyle 0m 31s hadoop-tools/hadoop-azure: The patch generated 20 new + 84 unchanged - 2 fixed = 104 total (was 86) +1 mvnsite 0m 54s the patch passed -1 whitespace 0m 0s The patch has 13 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 40s the patch passed -1 javadoc 0m 33s hadoop-azure in the patch failed.       Other Tests +1 unit 4m 8s hadoop-azure in the patch passed. +1 asflicense 0m 38s The patch does not generate ASF License warnings. 50m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884368/HADOOP_14520_07.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e29c4a79b947 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 26fafc3 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/whitespace-eol.txt javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13125/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Georgi Georgi Chalakov added a comment - - edited

          HADOOP_14520_07.patch
          Results : Tests run: 777, Failures: 0, Errors: 0, Skipped: 155

          if you are changing precondition check, I'd recommend StringUtils.isEmpty() for Preconditions.checkArgument(StringUtils.isNotEmpty(aKey));

          Done.

          If fields aren't updated after the constructor, best to set to final (example, compactionEnabled ?).

          Done.

          How long is downloadBlockList going to take in that constructor? More specifically: if compaction is disabled, can that step be skipped?

          downloadBlockList is used for two purposes: 1) to check for block existence 2) to download the block list

          If the stream needs a byte buffer, best to use ElasticByteBufferPool as a pool of buffers.

          Done.

          Use StorageErrorCodeStrings as the source of string constants to check for in exception error codes.

          Done.

          Rather than throw IOException(e), I'd prefer more specific (existing ones). That's PathIOException and subclasses, AzureException(e), and the java.io/java.nio ones.

          Done

          When wrapping a StorageException with another IOE, always include the toString value of the wrapped exception. That way, the log message of the top level log retains the underlying problem.

          Done.

          BlockBlobAppendStream.WriteRequest retry logic will retry even on RuntimeExceptions like IllegalArgumentException. Ideally they should be split into recoverable vs non-recoverable ops via a RetryPolicy. Is this an issue to address here though? Overall, with the new operatins doing retries, this may be time to embrace rety policies. Or at least create a JIRA entry on doing so.

          add*Command() will rethrow the last exception. That means the following write() or close() will retrow stored exception. It is not going to happen right away, but the will happen before the stream is closed()

          I know java.io.OutputStream is marked as single-thread only, but I know of code (hello HBase!) which means that you must make some of the calls thread safe. HADOOP-11708/HADOOP-11710 covers this issue in CryptoOutputStream. At the very least, flush() must be synchronous with itself, close() & maybe write()

          flush() is synchronous with itself through addFlushCommand(). We do not want flush() to be synchronous with write(). We would like while a thread waits for a flush(), other threads to continue writing.

          I'm unsure about BlockBlobAppendStream.close() waiting for up to 15 minutes for things to complete, but looking @ other blobstore clients, I can see that they are implicitly waiting without any timeout at all. And it's in the existing codebase. But: why was the time limit changed from 10 min to 15? Was this based on test failures? If so, where is the guarantee that a 15 minute wait is always sufficient.

          The change to 15 min was not based on test failures. I have changed the timeout back to 10 min and added a const.

          Looking at BlockBlobAppendStream thread pooling, I think having a thread pool per output stream is expensive, especially as it has a minimum size of 4; it will ramp up fast. A pool of min=1 max=4 might be less expensive. But really, the stream should be thinking about sharing a pool common to the FS, relying on callbacks to notify it of completion rather than just awaiting pool completion and a shared writeable field.

          I did a some tests with YCSB and a pool of min=1, max=4. It is slower and the difference is measurable. Considering how many output stream you usually have per FS, I would like to keep min=4, max=4. The shared pool is a good idea, but I am afraid we would need bigger change and at the end I am not sure we will get significant benefits.

          I think the access/use of lastException needs to be made stronger than just a volatile, as it means that code of the form if (lastException!=null) throw lastException isn't thread safe. I know, it's not that harmful provided lastException is never set to null, but I'd still like some isolated get/getAndSet/maybeThrow operations. Similarly, is lastException the best way to propagate failure, as it means that teardown failures are going to get reported ahead of earlier ones during the write itself. Overall, I propose using Callable<bq. over Runnable. Allows you to throw exceptions & return things, caller gets to pick them up when it chooses to.

          The original code was using Runnable and I didn’t know about Callable. We could create a JIRA to do the change here and for the page blob.

          Can you do a patch without all the whitespace stripping? It makes the patch too big & very brittle to cherrypick. I know the spaces are wrong, but trying to strip them in a patch creates needless patch conflict. when the patch goes in we'll strip off whitespace on new/changed lines. so it won't get any worse.

          Done

          Do try to get those line lengths under 80, including in comments. I know it seems out of date, and I'd prefer a higher number, but current consensus is 80 except when it's really hard to do. Supporting side-by-side diff comparison is a reason.

          Done

          Don't use import java.util.*; for imports —it's OK for static member import, but not for whole packages. Makes for brittleness across versions, as when a package adds a competing class, things stop building. I'm assuming this was the IDE being helpful here. If it has autoconvert to * on, turn it off, along with "delete trailing whitespace".

          Yep, the IDE was auto converting to *. It is off now.

          and add trailing "." on the first sentence of javadocs, as javadoc complains in their absence.

          Done.

          Use try-with-resources & you can get streams closed automatically

          Done

          index.md use `` around method names.

          Done

          index.md Need to emphasize cost compaction: a download of segments and uplod of a compacted replacement. Only for use when the Azure storage is on the same site/network, cost of bandwidth & time. Do provide details on failure guarantees too, like "doesn't ever lose data".

          Done

          It would seem good to have some FS statistics tracking compaction, e.g: #of compaction events, bytes downloaded from compaction, bytes uploaded. These can then be used in assertions in the tests, but, most importantly, can be looked at in production to see why things are slow and/or whether compaction is working.

          I don't think compaction is interesting, but if you can help me to find FS statistics sample, I will add it. I've tested compaction with YCSB tests on HBase and the highest number of blocks per blob for the WAL files was less than 1000. We turn on compaction after 32000 blocks. Block compaction is a safe net, not a main scenario. Block compaction improves the read operations and the write cost of Block compaction is amortized over such large number of blocks operations (bq. 32000).

          I'm worried about what happens when large buffers have been flushed & then a compaction starts. The size of the buffer needed will be that of sum(size(blocks)), won't it? I don't see any checks on those limits, such as a decision to set a maximum size of a compacted block & break up compactions if the total block count to compact is bq. that.

          The size of the read buffer will be the maximum size of a blob block: 4M. We search for best sequence of small blocks that can be replaced with a block less than 4M.

          Failure handling on the compaction process. Does a failure on a compaction download & upload in blockCompaction()} }need to fail the entire write process? If it's a transient error it could be overkill. However, if it is a sign that {{flush() isn't reliably working then the current behaviour is the one to run with.

          Block Compaction is triggered when the number of blocks is above 32000. That is very rare, but it helps minimizing the chance of getting to the point when no new write operations for this blob are permitted. When the compaction is triggered and the compaction fail, it is better to fail the entire write process.

          One thing I'd like (but which won't mandate) is for the stream to count the #of compaction events, bytes compacted and total duration. then provide some @VisibleForTesting @ Unstable getters, *and print them in the toString() call. That would line things up for moving to FS-level instrumentation, and can be used immediately.

          BlockBlobAppendStream: L349: use constant in StorageErrorCodeStrings

          Done

          Use org.apache.hadoop.util.DirectBufferPool to pool the buffers; stable code, uses weak refs to ensure GCs will recover free buffers from the pool.

          Before I created very simple buffer pool, I considered that option. The problem is that later when we want to upload the buffer with Azure Storage Client, we cannot (I couldn't) provide the buffer without allocating a new one and mem coping the data.

          Make sure that blockCompaction uses a buffer from the pool too; I don't think it does right now.

          Well, that was my intention, but I couldn't create OutputStream with custom buffer. I choose the other direction where I can steal the buffer and send it back to the buffer pool. Block compaction is a rare operation and when new buffer is allocated, it is going to be reused. If I can find how to create OutputStream object with custom buffer, I would change it.

          UploaderThreadFactory: idle thought: would it make sense to include the container ID or container & key in the thread? I don't know of anything else which does this, but it would aid thread dump diagnostics.

          It is a simple change “ + key”, but the log lines get too long, I think. It is not convenient reading it in a terminal window. If you really feel that this is necessary, I will add it.

          SelfRenewingLease L82: use the constants in StorageErrorCodeStrings

          Done.

          Test code: There's no concurrency test, which would be nice. Could one go into TestNativeAzureFileSystemConcurrency

          Can you suggest an existing test that I can use as an sample?

          Maybe also think about having TestBlockBlobInputStream use this stream as its upload mechanism; insert some flushes through the loop and see what actually happens on larger scale files. The small tests, while nice and fast, don't check things like buffer sizing if you have large blocks to combine.

          I am sorry, I don't understand how TestBlockBlobInputStream can be used here.

          TestNativeAzureFileSystemBlockCompaction: As background, I like to review tests from the following use case "its got a transient jenkins failure and all you have is the stack trace to debug what failed". Which means I expect tests to: preserve all stack traces, add as much diagnostics information in asserts, including text for every simple assertTrue/assertFalse —enough to get an idea what's wrong without pasting the stack in the IDE to find out which specific assert actually failed.

          I am not sure I understand how you recommend to change the code, but I had to similar issue to what you described above. From five runs once was failing with "Invalid Block List". The assert and the order of the tests helped me to find what is wrong. The order of the insert is not arbitrary and verifyBlockList allows finding the issue very close to the point of failure.

          verifyFileData & verifyAppend: I'm not actually sure these work properly if the created file is bq. the generated test data, and, by swallowing exceptions, they don't actually report underlying failures, merely trigger an assertion failure somewhere in the calling code. I'd replace these entirely with ContractTestUtils.verifyFileContents(), which does report failures and is widely enough used that it's considered stable.

          Done.

          testCompaction(): once the verify calls rethrow all exceptions, some of the asserts here can be cut there's a lot of copy-and-paste duplication fo the write/write/write/flush/verify sequences; these should be factored out into shared methods.

          Done

          if the stream.toString() call logs the compaction history, then includng the stream toString in all asserts would help diagnose problems.

          Compaction history would be proportional to the number of the blocks. the default value is 32000 blocks. Even for the tests where we have small number of blocks the history is was actually very useful for me. I need not only the history of the compaction but also the history of the write operations. That would be too much for keeping in the stream, I think.

          verifyBlockList: don't bother catching & asserting on exception, just throw it all the way up & let JUnit report it.

          Done.

          testCompactionDisabled}: use try-with-resource or {{IOUtils.cleanupWithLogger.

          Done.

          Most of those "is a magic number" complaints are just about common values in the test...if they were pulled out into some shared variables then it'd shut up checkstyle

          Done.

          there is that "15 minutes" constant in production. How about moving that up from an inline constant to a static constant "CLOSE_UPLOAD_DELAY" or similar in the class —so at least its obvious what the number is for/where the delay is chosen. At some point in the future, if ever felt to be an issue, then it could be made a config option, with all the trouble that ensues.

          Done.

          javadoc is still unhappy.. I'm actually surprised that it's not complaining about all the missing "."' chars at the end of each sentence ... maybe the latest update to java 8.x has got javadocs complaining less. Lovely as that may be, we have to worry about java9 too, so please: review the diff and add them to the new javadoc comments.

          I have added the missing "." at the end of each sentence.

          Probably a good time to look at the javadocs and make sure that there are {@code }

          I haven't found issues, but if I have missed something please let me know.

          writeBlockRequestInternal has retry logic that returns the buffer to the pool and then retries using the buffer that it just returned.

          This was a bug. It is fixed now.

          writeBlockRequestInternal is currently returning a byte array originally created by ByteArrayOutputStream to the buffer pool. If this is not clear, look at blockCompaction where it creates ByteArrayOutputStreamInternal, then wraps the underlying byte[] in a ByteBuffer and passes it to writeBlockRequestInternal which returns it to the pool.

          Fixed

          blockCompaction can be refactored to make unit testing easy. For example, extracting out a getBlockSequenceForCompaction function that takes a block list as input and returns a sequence of blocks to be compacted would allow a data driven unit test to run many different block lists thru the algorithm.

          I considered that but found that would be more valuable to run e2e.

          I recommend the following description for the blockCompaction function:

          Done.

          I recommend renaming BlockBlobAppendStream.bufferSize to maxBlockSize. It is the maximum size of a block.

          Done.

          Show
          Georgi Georgi Chalakov added a comment - - edited HADOOP_14520_07.patch Results : Tests run: 777, Failures: 0, Errors: 0, Skipped: 155 if you are changing precondition check, I'd recommend StringUtils.isEmpty() for Preconditions.checkArgument(StringUtils.isNotEmpty(aKey)); Done. If fields aren't updated after the constructor, best to set to final (example, compactionEnabled ?). Done. How long is downloadBlockList going to take in that constructor? More specifically: if compaction is disabled, can that step be skipped? downloadBlockList is used for two purposes: 1) to check for block existence 2) to download the block list If the stream needs a byte buffer, best to use ElasticByteBufferPool as a pool of buffers. Done. Use StorageErrorCodeStrings as the source of string constants to check for in exception error codes. Done. Rather than throw IOException(e), I'd prefer more specific (existing ones). That's PathIOException and subclasses, AzureException(e), and the java.io/java.nio ones. Done When wrapping a StorageException with another IOE, always include the toString value of the wrapped exception. That way, the log message of the top level log retains the underlying problem. Done. BlockBlobAppendStream.WriteRequest retry logic will retry even on RuntimeExceptions like IllegalArgumentException. Ideally they should be split into recoverable vs non-recoverable ops via a RetryPolicy. Is this an issue to address here though? Overall, with the new operatins doing retries, this may be time to embrace rety policies. Or at least create a JIRA entry on doing so. add*Command() will rethrow the last exception. That means the following write() or close() will retrow stored exception. It is not going to happen right away, but the will happen before the stream is closed() I know java.io.OutputStream is marked as single-thread only, but I know of code (hello HBase!) which means that you must make some of the calls thread safe. HADOOP-11708 / HADOOP-11710 covers this issue in CryptoOutputStream. At the very least, flush() must be synchronous with itself, close() & maybe write() flush() is synchronous with itself through addFlushCommand(). We do not want flush() to be synchronous with write(). We would like while a thread waits for a flush(), other threads to continue writing. I'm unsure about BlockBlobAppendStream.close() waiting for up to 15 minutes for things to complete, but looking @ other blobstore clients, I can see that they are implicitly waiting without any timeout at all. And it's in the existing codebase. But: why was the time limit changed from 10 min to 15? Was this based on test failures? If so, where is the guarantee that a 15 minute wait is always sufficient. The change to 15 min was not based on test failures. I have changed the timeout back to 10 min and added a const. Looking at BlockBlobAppendStream thread pooling, I think having a thread pool per output stream is expensive, especially as it has a minimum size of 4; it will ramp up fast. A pool of min=1 max=4 might be less expensive. But really, the stream should be thinking about sharing a pool common to the FS, relying on callbacks to notify it of completion rather than just awaiting pool completion and a shared writeable field. I did a some tests with YCSB and a pool of min=1, max=4. It is slower and the difference is measurable. Considering how many output stream you usually have per FS, I would like to keep min=4, max=4. The shared pool is a good idea, but I am afraid we would need bigger change and at the end I am not sure we will get significant benefits. I think the access/use of lastException needs to be made stronger than just a volatile, as it means that code of the form if (lastException!=null) throw lastException isn't thread safe. I know, it's not that harmful provided lastException is never set to null, but I'd still like some isolated get/getAndSet/maybeThrow operations. Similarly, is lastException the best way to propagate failure, as it means that teardown failures are going to get reported ahead of earlier ones during the write itself. Overall, I propose using Callable<bq. over Runnable. Allows you to throw exceptions & return things, caller gets to pick them up when it chooses to. The original code was using Runnable and I didn’t know about Callable. We could create a JIRA to do the change here and for the page blob. Can you do a patch without all the whitespace stripping? It makes the patch too big & very brittle to cherrypick. I know the spaces are wrong, but trying to strip them in a patch creates needless patch conflict. when the patch goes in we'll strip off whitespace on new/changed lines. so it won't get any worse. Done Do try to get those line lengths under 80, including in comments. I know it seems out of date, and I'd prefer a higher number, but current consensus is 80 except when it's really hard to do. Supporting side-by-side diff comparison is a reason. Done Don't use import java.util.*; for imports —it's OK for static member import, but not for whole packages. Makes for brittleness across versions, as when a package adds a competing class, things stop building. I'm assuming this was the IDE being helpful here. If it has autoconvert to * on, turn it off, along with "delete trailing whitespace". Yep, the IDE was auto converting to *. It is off now. and add trailing "." on the first sentence of javadocs, as javadoc complains in their absence. Done. Use try-with-resources & you can get streams closed automatically Done index.md use `` around method names. Done index.md Need to emphasize cost compaction: a download of segments and uplod of a compacted replacement. Only for use when the Azure storage is on the same site/network, cost of bandwidth & time. Do provide details on failure guarantees too, like "doesn't ever lose data". Done It would seem good to have some FS statistics tracking compaction, e.g: #of compaction events, bytes downloaded from compaction, bytes uploaded. These can then be used in assertions in the tests, but, most importantly, can be looked at in production to see why things are slow and/or whether compaction is working. I don't think compaction is interesting, but if you can help me to find FS statistics sample, I will add it. I've tested compaction with YCSB tests on HBase and the highest number of blocks per blob for the WAL files was less than 1000. We turn on compaction after 32000 blocks. Block compaction is a safe net, not a main scenario. Block compaction improves the read operations and the write cost of Block compaction is amortized over such large number of blocks operations (bq. 32000). I'm worried about what happens when large buffers have been flushed & then a compaction starts. The size of the buffer needed will be that of sum(size(blocks)), won't it? I don't see any checks on those limits, such as a decision to set a maximum size of a compacted block & break up compactions if the total block count to compact is bq. that. The size of the read buffer will be the maximum size of a blob block: 4M. We search for best sequence of small blocks that can be replaced with a block less than 4M. Failure handling on the compaction process. Does a failure on a compaction download & upload in blockCompaction()} }need to fail the entire write process? If it's a transient error it could be overkill. However, if it is a sign that {{flush() isn't reliably working then the current behaviour is the one to run with. Block Compaction is triggered when the number of blocks is above 32000. That is very rare, but it helps minimizing the chance of getting to the point when no new write operations for this blob are permitted. When the compaction is triggered and the compaction fail, it is better to fail the entire write process. One thing I'd like (but which won't mandate) is for the stream to count the #of compaction events, bytes compacted and total duration. then provide some @VisibleForTesting @ Unstable getters, *and print them in the toString() call. That would line things up for moving to FS-level instrumentation, and can be used immediately. BlockBlobAppendStream: L349: use constant in StorageErrorCodeStrings Done Use org.apache.hadoop.util.DirectBufferPool to pool the buffers; stable code, uses weak refs to ensure GCs will recover free buffers from the pool. Before I created very simple buffer pool, I considered that option. The problem is that later when we want to upload the buffer with Azure Storage Client, we cannot (I couldn't) provide the buffer without allocating a new one and mem coping the data. Make sure that blockCompaction uses a buffer from the pool too; I don't think it does right now. Well, that was my intention, but I couldn't create OutputStream with custom buffer. I choose the other direction where I can steal the buffer and send it back to the buffer pool. Block compaction is a rare operation and when new buffer is allocated, it is going to be reused. If I can find how to create OutputStream object with custom buffer, I would change it. UploaderThreadFactory: idle thought: would it make sense to include the container ID or container & key in the thread? I don't know of anything else which does this, but it would aid thread dump diagnostics. It is a simple change “ + key”, but the log lines get too long, I think. It is not convenient reading it in a terminal window. If you really feel that this is necessary, I will add it. SelfRenewingLease L82: use the constants in StorageErrorCodeStrings Done. Test code: There's no concurrency test, which would be nice. Could one go into TestNativeAzureFileSystemConcurrency Can you suggest an existing test that I can use as an sample? Maybe also think about having TestBlockBlobInputStream use this stream as its upload mechanism; insert some flushes through the loop and see what actually happens on larger scale files. The small tests, while nice and fast, don't check things like buffer sizing if you have large blocks to combine. I am sorry, I don't understand how TestBlockBlobInputStream can be used here. TestNativeAzureFileSystemBlockCompaction: As background, I like to review tests from the following use case "its got a transient jenkins failure and all you have is the stack trace to debug what failed". Which means I expect tests to: preserve all stack traces, add as much diagnostics information in asserts, including text for every simple assertTrue/assertFalse —enough to get an idea what's wrong without pasting the stack in the IDE to find out which specific assert actually failed. I am not sure I understand how you recommend to change the code, but I had to similar issue to what you described above. From five runs once was failing with "Invalid Block List". The assert and the order of the tests helped me to find what is wrong. The order of the insert is not arbitrary and verifyBlockList allows finding the issue very close to the point of failure. verifyFileData & verifyAppend: I'm not actually sure these work properly if the created file is bq. the generated test data, and, by swallowing exceptions, they don't actually report underlying failures, merely trigger an assertion failure somewhere in the calling code. I'd replace these entirely with ContractTestUtils.verifyFileContents(), which does report failures and is widely enough used that it's considered stable. Done. testCompaction(): once the verify calls rethrow all exceptions, some of the asserts here can be cut there's a lot of copy-and-paste duplication fo the write/write/write/flush/verify sequences; these should be factored out into shared methods. Done if the stream.toString() call logs the compaction history, then includng the stream toString in all asserts would help diagnose problems. Compaction history would be proportional to the number of the blocks. the default value is 32000 blocks. Even for the tests where we have small number of blocks the history is was actually very useful for me. I need not only the history of the compaction but also the history of the write operations. That would be too much for keeping in the stream, I think. verifyBlockList: don't bother catching & asserting on exception, just throw it all the way up & let JUnit report it. Done. testCompactionDisabled}: use try-with-resource or {{IOUtils.cleanupWithLogger. Done. Most of those "is a magic number" complaints are just about common values in the test...if they were pulled out into some shared variables then it'd shut up checkstyle Done. there is that "15 minutes" constant in production. How about moving that up from an inline constant to a static constant "CLOSE_UPLOAD_DELAY" or similar in the class —so at least its obvious what the number is for/where the delay is chosen. At some point in the future, if ever felt to be an issue, then it could be made a config option, with all the trouble that ensues. Done. javadoc is still unhappy.. I'm actually surprised that it's not complaining about all the missing "."' chars at the end of each sentence ... maybe the latest update to java 8.x has got javadocs complaining less. Lovely as that may be, we have to worry about java9 too, so please: review the diff and add them to the new javadoc comments. I have added the missing "." at the end of each sentence. Probably a good time to look at the javadocs and make sure that there are {@code } I haven't found issues, but if I have missed something please let me know. writeBlockRequestInternal has retry logic that returns the buffer to the pool and then retries using the buffer that it just returned. This was a bug. It is fixed now. writeBlockRequestInternal is currently returning a byte array originally created by ByteArrayOutputStream to the buffer pool. If this is not clear, look at blockCompaction where it creates ByteArrayOutputStreamInternal, then wraps the underlying byte[] in a ByteBuffer and passes it to writeBlockRequestInternal which returns it to the pool. Fixed blockCompaction can be refactored to make unit testing easy. For example, extracting out a getBlockSequenceForCompaction function that takes a block list as input and returns a sequence of blocks to be compacted would allow a data driven unit test to run many different block lists thru the algorithm. I considered that but found that would be more valuable to run e2e. I recommend the following description for the blockCompaction function: Done. I recommend renaming BlockBlobAppendStream.bufferSize to maxBlockSize. It is the maximum size of a block. Done.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I've been looking at the current page blob code, and I've seen something else I think may need fixing; how things close

          • PageBlobOutputStream.close() call is completely synchronized, even during a long upload. Which means that until that upload completes, all calls may block
          • I don't see the flush or wrtie calls checking for the stream being open before doing anything, meaning that the only way that the state will be checked is internally, in iopool.execute() or elsewhere.

          I think here it'd be good to move to an atomic boolean to manage closed state, and use it as the guard to the operations.

          Show
          stevel@apache.org Steve Loughran added a comment - I've been looking at the current page blob code, and I've seen something else I think may need fixing; how things close PageBlobOutputStream.close() call is completely synchronized , even during a long upload. Which means that until that upload completes, all calls may block I don't see the flush or wrtie calls checking for the stream being open before doing anything, meaning that the only way that the state will be checked is internally, in iopool.execute() or elsewhere. I think here it'd be good to move to an atomic boolean to manage closed state, and use it as the guard to the operations.
          Hide
          tmarquardt Thomas Marquardt added a comment -

          I will hand this off to Georgi, as he is returning from vacation Monday. I noticed the following while reviewing the latest patches:

          1) writeBlockRequestInternal has retry logic that returns the buffer to the pool and then retries using the buffer that it just returned.

          2) writeBlockRequestInternal is currently returning a byte array originally created by ByteArrayOutputStream to the buffer pool. If this is not clear, look at blockCompaction where it creates ByteArrayOutputStreamInternal, then wraps the underlying byte[] in a ByteBuffer and passes it to writeBlockRequestInternal which returns it to the pool.

          3) blockCompaction can be refactored to make unit testing easy. For example, extracting out a getBlockSequenceForCompaction function that takes a block list as input and returns a sequence of blocks to be compacted would allow a data driven unit test to run many different block lists thru the algorithm.

          4) I recommend the following description for the blockCompaction function:

          /**
           * Block compaction is only enabled when the number of blocks exceeds activateCompactionBlockCount.
           * The algorithm searches for the longest sequence of two or more blocks {b1, b2, ..., bn} such that
           * size(b1) + size(b2) + ... + size(bn) < maximum-block-size.  It then downloads the blocks in the
           * sequence, concatenates the data to form a single block, uploads this new block, and updates the block
           * list to replace the sequence of blocks with the new block.
           */
          

          5) I recommend renaming BlockBlobAppendStream.bufferSize to maxBlockSize. It is the maximum size of a block.

          Show
          tmarquardt Thomas Marquardt added a comment - I will hand this off to Georgi, as he is returning from vacation Monday. I noticed the following while reviewing the latest patches: 1) writeBlockRequestInternal has retry logic that returns the buffer to the pool and then retries using the buffer that it just returned. 2) writeBlockRequestInternal is currently returning a byte array originally created by ByteArrayOutputStream to the buffer pool. If this is not clear, look at blockCompaction where it creates ByteArrayOutputStreamInternal , then wraps the underlying byte[] in a ByteBuffer and passes it to writeBlockRequestInternal which returns it to the pool. 3) blockCompaction can be refactored to make unit testing easy. For example, extracting out a getBlockSequenceForCompaction function that takes a block list as input and returns a sequence of blocks to be compacted would allow a data driven unit test to run many different block lists thru the algorithm. 4) I recommend the following description for the blockCompaction function: /** * Block compaction is only enabled when the number of blocks exceeds activateCompactionBlockCount. * The algorithm searches for the longest sequence of two or more blocks {b1, b2, ..., bn} such that * size(b1) + size(b2) + ... + size(bn) < maximum-block-size. It then downloads the blocks in the * sequence, concatenates the data to form a single block, uploads this new block, and updates the block * list to replace the sequence of blocks with the new block. */ 5) I recommend renaming BlockBlobAppendStream.bufferSize to maxBlockSize . It is the maximum size of a block.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Production code is looking pretty good, so I've just gone through the tests in detail too now. Sorry.

          1. size of buffers/compaction blocks

          I'm worried about what happens when large buffers have been flushed & then a compaction starts. The size of the buffer needed will be that of sum(size(blocks)), won't it? I don't see any checks on those limits, such as a decision to set a maximum size of a compacted block & break up compactions if the total block count to compact is > that.

          2. Failure handling on the compaction process. Does a failure on a compaction download & upload in blockCompaction()} }need to fail the entire write process? If it's a transient error it could be overkill. However, if it is a sign that {{flush() isn't reliably working then the current behaviour is the one to run with.

          3. One thing I'd like (but which won't mandate) is for the stream to count the #of compaction events, bytes compacted and total duration. then provide some @VisibleForTesting @ Unstable getters, *and print them in the toString() call. That would line things up for moving to FS-level instrumentation, and can be used immediately .

          BlockBlobAppendStream:

          • L349: use constant in StorageErrorCodeStrings
          • Use org.apache.hadoop.util.DirectBufferPool to pool the buffers; stable code, uses weak refs to ensure GCs will recover free buffers from the pool.
          • Make sure that blockCompaction uses a buffer from the pool too; I don't think it does right now.
          • UploaderThreadFactory: idle thought: would it make sense to include the container ID or container & key in the thread? I don't know of anything else which does this, but it would aid thread dump diagnostics.

          SelfRenewingLease

          L82: use the constants in StorageErrorCodeStrings

          Test code

          • There's no concurrency test, which would be nice. Could one go into TestNativeAzureFileSystemConcurrency
          • Maybe also think about having TestBlockBlobInputStream use this stream as its upload mechanism; insert some flushes through the loop and see what actually happens on larger scale files. The small tests, while nice and fast, don't check things like buffer sizing if you have large blocks to combine.

          TestNativeAzureFileSystemBlockCompaction

          As background, I like to review tests from the following use case "its got a transient jenkins failure and all you have is the stack trace to debug what failed". Which means I expect tests to: preserve all stack traces, add as much diagnostics information in asserts, including text for every simple assertTrue/assertFalse —enough to get an idea what's wrong without pasting the stack in the IDE to find out which specific assert actually failed.

          verifyFileData & verifyAppend:

          I'm not actually sure these work properly if the created file is > the generated test data, and, by swallowing exceptions, they don't actually report underlying failures, merely trigger an assertion failure somewhere in the calling code.

          I'd replace these entirely with ContractTestUtils.verifyFileContents(), which does report failures and is widely enough used that it's considered stable.

          testCompaction()

          • once the verify calls rethrow all exceptions, some of the asserts here can be cut
          • there's a lot of copy-and-paste duplication fo the write/write/write/flush/verify sequences; these should be factored out into shared methods.
          • if the stream.toString() call logs the compaction history, then includng the stream toString in all asserts would help diagnose problems.

          other

          • verifyBlockList: don't bother catching & asserting on exception, just throw it all the way up & let JUnit report it.
          • testCompactionDisabled}: use try-with-resource or {{IOUtils.cleanupWithLogger.

          checkstyle

          1. Most of those "is a magic number" complaints are just about common values in the test...if they were pulled out into some shared variables then it'd shut up checkstyle
          2. there is that "15 minutes" constant in production. How about moving that up from an inline constant to a static constant "CLOSE_UPLOAD_DELAY" or similar in the class —so at least its obvious what the number is for/where the delay is chosen. At some point in the future, if ever felt to be an issue, then it could be made a config option, with all the trouble that ensues.
          3. javadoc is still unhappy.. I'm actually surprised that it's not complaining about all the missing "."' chars at the end of each sentence ... maybe the latest update to java 8.x has got javadocs complaining less. Lovely as that may be, we have to worry about java9 too, so please: review the diff and add them to the new javadoc comments.
          1. Probably a good time to look at the javadocs and make sure that there are {@code }

            wraps
            around code snippets, variable names, etc.

          Show
          stevel@apache.org Steve Loughran added a comment - Production code is looking pretty good, so I've just gone through the tests in detail too now. Sorry. 1. size of buffers/compaction blocks I'm worried about what happens when large buffers have been flushed & then a compaction starts. The size of the buffer needed will be that of sum(size(blocks)), won't it? I don't see any checks on those limits, such as a decision to set a maximum size of a compacted block & break up compactions if the total block count to compact is > that. 2. Failure handling on the compaction process. Does a failure on a compaction download & upload in blockCompaction()} }need to fail the entire write process? If it's a transient error it could be overkill. However, if it is a sign that {{flush() isn't reliably working then the current behaviour is the one to run with. 3. One thing I'd like (but which won't mandate) is for the stream to count the #of compaction events, bytes compacted and total duration. then provide some @VisibleForTesting @ Unstable getters, *and print them in the toString() call. That would line things up for moving to FS-level instrumentation, and can be used immediately . BlockBlobAppendStream : L349: use constant in StorageErrorCodeStrings Use org.apache.hadoop.util.DirectBufferPool to pool the buffers; stable code, uses weak refs to ensure GCs will recover free buffers from the pool. Make sure that blockCompaction uses a buffer from the pool too; I don't think it does right now. UploaderThreadFactory : idle thought: would it make sense to include the container ID or container & key in the thread? I don't know of anything else which does this, but it would aid thread dump diagnostics. SelfRenewingLease L82: use the constants in StorageErrorCodeStrings Test code There's no concurrency test, which would be nice. Could one go into TestNativeAzureFileSystemConcurrency Maybe also think about having TestBlockBlobInputStream use this stream as its upload mechanism; insert some flushes through the loop and see what actually happens on larger scale files. The small tests, while nice and fast, don't check things like buffer sizing if you have large blocks to combine. TestNativeAzureFileSystemBlockCompaction As background, I like to review tests from the following use case "its got a transient jenkins failure and all you have is the stack trace to debug what failed". Which means I expect tests to: preserve all stack traces, add as much diagnostics information in asserts, including text for every simple assertTrue/assertFalse —enough to get an idea what's wrong without pasting the stack in the IDE to find out which specific assert actually failed. verifyFileData & verifyAppend : I'm not actually sure these work properly if the created file is > the generated test data, and, by swallowing exceptions, they don't actually report underlying failures, merely trigger an assertion failure somewhere in the calling code. I'd replace these entirely with ContractTestUtils.verifyFileContents() , which does report failures and is widely enough used that it's considered stable. testCompaction() once the verify calls rethrow all exceptions, some of the asserts here can be cut there's a lot of copy-and-paste duplication fo the write/write/write/flush/verify sequences; these should be factored out into shared methods. if the stream.toString() call logs the compaction history, then includng the stream toString in all asserts would help diagnose problems. other verifyBlockList : don't bother catching & asserting on exception, just throw it all the way up & let JUnit report it. testCompactionDisabled}: use try-with-resource or {{IOUtils.cleanupWithLogger . checkstyle Most of those "is a magic number" complaints are just about common values in the test...if they were pulled out into some shared variables then it'd shut up checkstyle there is that "15 minutes" constant in production. How about moving that up from an inline constant to a static constant "CLOSE_UPLOAD_DELAY" or similar in the class —so at least its obvious what the number is for/where the delay is chosen. At some point in the future, if ever felt to be an issue, then it could be made a config option, with all the trouble that ensues. javadoc is still unhappy.. I'm actually surprised that it's not complaining about all the missing "."' chars at the end of each sentence ... maybe the latest update to java 8.x has got javadocs complaining less. Lovely as that may be, we have to worry about java9 too, so please: review the diff and add them to the new javadoc comments. Probably a good time to look at the javadocs and make sure that there are {@code } wraps around code snippets, variable names, etc.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 13m 54s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 23s trunk passed
          +1 findbugs 0m 30s trunk passed
          +1 javadoc 0m 15s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 19s the patch passed
          +1 compile 0m 17s the patch passed
          +1 javac 0m 17s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 64 new + 82 unchanged - 4 fixed = 146 total (was 86)
          +1 mvnsite 0m 22s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 39s the patch passed
          -1 javadoc 0m 14s hadoop-azure in the patch failed.
                Other Tests
          +1 unit 2m 10s hadoop-azure in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          21m 43s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882298/HADOOP-14520-006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7a4f3a8eac5b 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1f04cb4
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 54s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 23s trunk passed +1 findbugs 0m 30s trunk passed +1 javadoc 0m 15s trunk passed       Patch Compile Tests +1 mvninstall 0m 19s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 64 new + 82 unchanged - 4 fixed = 146 total (was 86) +1 mvnsite 0m 22s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 39s the patch passed -1 javadoc 0m 14s hadoop-azure in the patch failed.       Other Tests +1 unit 2m 10s hadoop-azure in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 21m 43s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882298/HADOOP-14520-006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7a4f3a8eac5b 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1f04cb4 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13058/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          tmarquardt Thomas Marquardt added a comment -

          Attaching HADOOP-14520-006.patch.

          Thanks for the feedback. I've updated the patch to address your feedback and feedback from my own review.

          All tests are passing against my tmarql3 account:

          Tests run: 776, Failures: 0, Errors: 0, Skipped: 155

          Show
          tmarquardt Thomas Marquardt added a comment - Attaching HADOOP-14520 -006.patch. Thanks for the feedback. I've updated the patch to address your feedback and feedback from my own review. All tests are passing against my tmarql3 account: Tests run: 776, Failures: 0, Errors: 0, Skipped: 155
          Hide
          tmarquardt Thomas Marquardt added a comment -

          I started working on an update yesterday, since Georgi is on vacation. I'll provide an updated patch and review soon.

          Show
          tmarquardt Thomas Marquardt added a comment - I started working on an update yesterday, since Georgi is on vacation. I'll provide an updated patch and review soon.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This is quite a big patch; I've had to learn my way round bits of code while reviewing it. For that reason alone, I'm not knowledgeable enough to be the sole reviewer. What I have done is gone through the code & tried to understand what it's working with, comments below. The bad news is, because it's code I'm not familiar with, (a) my comments go further than just this patch and (b) I may be utterly wrong. Bear that in mind.

          Here's my first review, though its not detailed enough.

          AzureNativeFileSystemStore

          • It looks like AzureNativeFileSystemStore.getDirectorySet() doesn't trim whitespace from paths. Created HADOOP-14778 to deal with it separately.

          BlockBlobAppendStream

          • if you are changing precondition check, I'd recommend StringUtils.isEmpty() for
          Preconditions.checkArgument(StringUtils.isNotEmpty(aKey));
          Preconditions.checkArgument(bufferSize >= 0);
          
          • If fields aren't updated after the constructor, best to set to final (example, compactionEnabled ?).
          • How long is downloadBlockList going to take in that constructor? More specifically: if compaction is disabled, can that step be skipped?
          • If the stream needs a byte buffer, best to use ElasticByteBufferPool as a pool of buffers.

          Exception handling, wrapping, rethrowing

          • Use StorageErrorCodeStrings as the source of string constants to check for in exception error codes.
          • Rather than throw IOException(e), I'd prefer more specific (existing ones). That's PathIOException and subclasses, AzureException(e), and the java.io/java.nio ones. Whichever is closest to what's actually gone wrong. IOEs are too generic to use in try/catch.
          • When wrapping a StorageException with another IOE, always include the toString value of the wrapped exception.
            That way, the log message of the top level log retains the underlying problem.

          Example from UploadBlockCommand:

           throw new IOException("Encountered Exception while committing append blocks", ex);
          
           throw new IOException("Encountered Exception while committing append blocks: " + ex, ex);
          
          • BlockBlobAppendStream.WriteRequest retry logic will retry even on RuntimeExceptions like IllegalArgumentException. Ideally they should be split into recoverable vs non-recoverable ops via a RetryPolicy. Is this an issue to address here though?

          Overall, with the new operatins doing retries, this may be time to embrace rety policies. Or at least create a JIRA entry on doing so.

          Concurrency

          1. I know java.io.OutputStream is marked as single-thread only, but I know of code (hello HBase!) which means that you must make some of the calls thread safe. HADOOP-11708/HADOOP-11710 covers this issue in CryptoOutputStream. At the very least, flush() must be synchronous with itself, close() & maybe write()
          2. I'm unsure about BlockBlobAppendStream.close() waiting for up to 15 minutes for things to complete, but looking @ other blobstore clients, I can see that they are implicitly waiting without any timeout at all. And it's in the existing codebase. But: why was the time limit changed from 10 min to 15? Was this based on test failures? If so, where is the guarantee that a 15 minute wait is always sufficient.
          3. Looking at BlockBlobAppendStream thread pooling, I think having a thread pool per output stream is expensive, especially as it has a minimum size of 4; it will ramp up fast. A pool of min=1 max=4 might be less expensive. But really, the stream should be thinking about sharing a pool common to the FS, relying on callbacks to notify it of completion rather than just awaiting pool completion and a shared writeable field.
          4. I think the access/use of lastException needs to be made stronger than just a volatile, as it means that code of the form if (lastException!=null) throw lastException isn't thread safe. I know, it's not that harmful provided lastException is never set to null, but I'd still like some isolated get/getAndSet/maybeThrow operations.
          5. Similarly, is lastException the best way to propagate failure, as it means that teardown failures are going to get reported ahead of earlier ones during the write itself.

          Overall, I propose using Callable<> over Runnable. Allows you to throw exceptions & return things, caller gets to pick them up when it chooses to.

          code style

          Checkstyle has a lot of complaints (which will need a resubmit to show).

          • Can you do a patch without all the whitespace stripping? It makes the patch too big & very brittle to cherrypick. I know the spaces are wrong, but trying to strip them in a patch creates needless patch conflict. when the patch goes in we'll strip off whitespace on new/changed lines. so it won't get any worse.
          • Do try to get those line lengths under 80, including in comments. I know it seems out of date, and I'd prefer a higher number, but current consensus is 80 except when it's really hard to do. Supporting side-by-side diff comparison is a reason.
          • Don't use import java.util.*; for imports —it's OK for static member import, but not for whole packages. Makes for brittleness across versions, as when a package adds a competing class, things stop building. I'm assuming this was the IDE being helpful here. If it has autoconvert to * on, turn it off, along with "delete trailing whitespace".
          • and add trailing "." on the first sentence of javadocs, as javadoc complains in their absence.

          Testing/ TestNativeAzureFileSystemBlockCompaction

          1. I should warn in HADOOP-14553 I'm trying to support parallel test runs in the hadoop-azure module, if it goes in first then this patch will need modification to parallelise, at least if a non-mock test is used.
          • why don't verifyFileData and verifyAppend throw up any exceptions raised?
          1. Use try-with-resources & you can get streams closed automatically

          index.md

          • use `` around method names.
          • Need to emphasize cost compaction: a download of segments and uplod of a compacted replacement. Only for use when the Azure storage is on the same site/network, cost of bandwidth & time. Do provide details on failure guarantees too, like "doesn't ever lose data".

          Other points

          It would seem good to have some FS statistics tracking compaction, e.g: #of compaction events, bytes downloaded from compaction, bytes uploaded. These can then be used in assertions in the tests, but, most importantly, can be looked at in production to see why things are slow and/or whether compaction is working.

          Show
          stevel@apache.org Steve Loughran added a comment - This is quite a big patch; I've had to learn my way round bits of code while reviewing it. For that reason alone, I'm not knowledgeable enough to be the sole reviewer. What I have done is gone through the code & tried to understand what it's working with, comments below. The bad news is, because it's code I'm not familiar with, (a) my comments go further than just this patch and (b) I may be utterly wrong. Bear that in mind. Here's my first review, though its not detailed enough. AzureNativeFileSystemStore It looks like AzureNativeFileSystemStore.getDirectorySet() doesn't trim whitespace from paths. Created HADOOP-14778 to deal with it separately. BlockBlobAppendStream if you are changing precondition check, I'd recommend StringUtils.isEmpty() for Preconditions.checkArgument(StringUtils.isNotEmpty(aKey)); Preconditions.checkArgument(bufferSize >= 0); If fields aren't updated after the constructor, best to set to final (example, compactionEnabled ?). How long is downloadBlockList going to take in that constructor? More specifically: if compaction is disabled, can that step be skipped? If the stream needs a byte buffer, best to use ElasticByteBufferPool as a pool of buffers. Exception handling, wrapping, rethrowing Use StorageErrorCodeStrings as the source of string constants to check for in exception error codes. Rather than throw IOException(e) , I'd prefer more specific (existing ones). That's PathIOException and subclasses, AzureException(e) , and the java.io/java.nio ones. Whichever is closest to what's actually gone wrong. IOEs are too generic to use in try/catch. When wrapping a StorageException with another IOE, always include the toString value of the wrapped exception. That way, the log message of the top level log retains the underlying problem. Example from UploadBlockCommand : throw new IOException( "Encountered Exception while committing append blocks" , ex); throw new IOException( "Encountered Exception while committing append blocks: " + ex, ex); BlockBlobAppendStream.WriteRequest retry logic will retry even on RuntimeExceptions like IllegalArgumentException. Ideally they should be split into recoverable vs non-recoverable ops via a RetryPolicy . Is this an issue to address here though? Overall, with the new operatins doing retries, this may be time to embrace rety policies. Or at least create a JIRA entry on doing so. Concurrency I know java.io.OutputStream is marked as single-thread only, but I know of code (hello HBase!) which means that you must make some of the calls thread safe. HADOOP-11708 / HADOOP-11710 covers this issue in CryptoOutputStream. At the very least, flush() must be synchronous with itself, close() & maybe write() I'm unsure about BlockBlobAppendStream.close() waiting for up to 15 minutes for things to complete, but looking @ other blobstore clients, I can see that they are implicitly waiting without any timeout at all. And it's in the existing codebase. But: why was the time limit changed from 10 min to 15? Was this based on test failures? If so, where is the guarantee that a 15 minute wait is always sufficient. Looking at BlockBlobAppendStream thread pooling, I think having a thread pool per output stream is expensive, especially as it has a minimum size of 4; it will ramp up fast. A pool of min=1 max=4 might be less expensive. But really, the stream should be thinking about sharing a pool common to the FS, relying on callbacks to notify it of completion rather than just awaiting pool completion and a shared writeable field. I think the access/use of lastException needs to be made stronger than just a volatile , as it means that code of the form if (lastException!=null) throw lastException isn't thread safe. I know, it's not that harmful provided lastException is never set to null, but I'd still like some isolated get/getAndSet/maybeThrow operations. Similarly, is lastException the best way to propagate failure, as it means that teardown failures are going to get reported ahead of earlier ones during the write itself. Overall, I propose using Callable<> over Runnable. Allows you to throw exceptions & return things, caller gets to pick them up when it chooses to. code style Checkstyle has a lot of complaints (which will need a resubmit to show). Can you do a patch without all the whitespace stripping? It makes the patch too big & very brittle to cherrypick. I know the spaces are wrong, but trying to strip them in a patch creates needless patch conflict. when the patch goes in we'll strip off whitespace on new/changed lines. so it won't get any worse. Do try to get those line lengths under 80, including in comments. I know it seems out of date, and I'd prefer a higher number, but current consensus is 80 except when it's really hard to do. Supporting side-by-side diff comparison is a reason. Don't use import java.util.*; for imports —it's OK for static member import, but not for whole packages. Makes for brittleness across versions, as when a package adds a competing class, things stop building. I'm assuming this was the IDE being helpful here. If it has autoconvert to * on, turn it off, along with "delete trailing whitespace". and add trailing "." on the first sentence of javadocs, as javadoc complains in their absence. Testing/ TestNativeAzureFileSystemBlockCompaction I should warn in HADOOP-14553 I'm trying to support parallel test runs in the hadoop-azure module, if it goes in first then this patch will need modification to parallelise, at least if a non-mock test is used. why don't verifyFileData and verifyAppend throw up any exceptions raised? Use try-with-resources & you can get streams closed automatically index.md use `` around method names. Need to emphasize cost compaction: a download of segments and uplod of a compacted replacement. Only for use when the Azure storage is on the same site/network, cost of bandwidth & time. Do provide details on failure guarantees too, like "doesn't ever lose data". Other points It would seem good to have some FS statistics tracking compaction, e.g: #of compaction events, bytes downloaded from compaction, bytes uploaded. These can then be used in assertions in the tests, but, most importantly, can be looked at in production to see why things are slow and/or whether compaction is working.
          Hide
          shanem Shane Mainali added a comment -

          I have reviewed the latest as well and it looks good to me (my comments were already taken care of in previous patches), thanks Georgi Chalakov!

          Note that Georgi Chalakov also did HBase and other testing for this to validate the changes.

          Show
          shanem Shane Mainali added a comment - I have reviewed the latest as well and it looks good to me (my comments were already taken care of in previous patches), thanks Georgi Chalakov ! Note that Georgi Chalakov also did HBase and other testing for this to validate the changes.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 13m 2s trunk passed
          +1 compile 0m 18s trunk passed
          +1 checkstyle 0m 14s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 findbugs 0m 26s trunk passed
          +1 javadoc 0m 13s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 17s the patch passed
          +1 compile 0m 16s the patch passed
          +1 javac 0m 16s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 82 new + 116 unchanged - 4 fixed = 198 total (was 120)
          +1 mvnsite 0m 17s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 32s the patch passed
          +1 javadoc 0m 11s the patch passed
                Other Tests
          +1 unit 1m 21s hadoop-azure in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          19m 25s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876335/HADOOP-14520-05.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d660dd8f2ed9 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ba5b056
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 2s trunk passed +1 compile 0m 18s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 20s trunk passed +1 findbugs 0m 26s trunk passed +1 javadoc 0m 13s trunk passed       Patch Compile Tests +1 mvninstall 0m 17s the patch passed +1 compile 0m 16s the patch passed +1 javac 0m 16s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 82 new + 116 unchanged - 4 fixed = 198 total (was 120) +1 mvnsite 0m 17s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 32s the patch passed +1 javadoc 0m 11s the patch passed       Other Tests +1 unit 1m 21s hadoop-azure in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 19m 25s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876335/HADOOP-14520-05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d660dd8f2ed9 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ba5b056 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12746/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 13m 9s trunk passed
          +1 compile 0m 19s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 findbugs 0m 28s trunk passed
          +1 javadoc 0m 14s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 16s the patch passed
          +1 compile 0m 16s the patch passed
          +1 javac 0m 16s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 80 new + 112 unchanged - 4 fixed = 192 total (was 116)
          +1 mvnsite 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 0m 34s hadoop-tools/hadoop-azure generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
          -1 javadoc 0m 11s hadoop-azure in the patch failed.
                Other Tests
          +1 unit 1m 21s hadoop-azure in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          19m 35s



          Reason Tests
          FindBugs module:hadoop-tools/hadoop-azure
            Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.activateCompactionBlockCount; locked 50% of time Unsynchronized access at BlockBlobAppendStream.java:50% of time Unsynchronized access at BlockBlobAppendStream.java:[line 807]
            Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.bufferSize; locked 60% of time Unsynchronized access at BlockBlobAppendStream.java:60% of time Unsynchronized access at BlockBlobAppendStream.java:[line 875]
            Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.outBuffer; locked 50% of time Unsynchronized access at BlockBlobAppendStream.java:50% of time Unsynchronized access at BlockBlobAppendStream.java:[line 413]
            Should org.apache.hadoop.fs.azure.BlockBlobAppendStream$ByteArrayOutputStreamInternal be a static inner class? At BlockBlobAppendStream.java:inner class? At BlockBlobAppendStream.java:[lines 527-532]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876296/HADOOP-14520-4.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9065c56a0341 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f484a6f
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 9s trunk passed +1 compile 0m 19s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 20s trunk passed +1 findbugs 0m 28s trunk passed +1 javadoc 0m 14s trunk passed       Patch Compile Tests +1 mvninstall 0m 16s the patch passed +1 compile 0m 16s the patch passed +1 javac 0m 16s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 80 new + 112 unchanged - 4 fixed = 192 total (was 116) +1 mvnsite 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 0m 34s hadoop-tools/hadoop-azure generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) -1 javadoc 0m 11s hadoop-azure in the patch failed.       Other Tests +1 unit 1m 21s hadoop-azure in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 19m 35s Reason Tests FindBugs module:hadoop-tools/hadoop-azure   Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.activateCompactionBlockCount; locked 50% of time Unsynchronized access at BlockBlobAppendStream.java:50% of time Unsynchronized access at BlockBlobAppendStream.java: [line 807]   Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.bufferSize; locked 60% of time Unsynchronized access at BlockBlobAppendStream.java:60% of time Unsynchronized access at BlockBlobAppendStream.java: [line 875]   Inconsistent synchronization of org.apache.hadoop.fs.azure.BlockBlobAppendStream.outBuffer; locked 50% of time Unsynchronized access at BlockBlobAppendStream.java:50% of time Unsynchronized access at BlockBlobAppendStream.java: [line 413]   Should org.apache.hadoop.fs.azure.BlockBlobAppendStream$ByteArrayOutputStreamInternal be a static inner class? At BlockBlobAppendStream.java:inner class? At BlockBlobAppendStream.java: [lines 527-532] Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876296/HADOOP-14520-4.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9065c56a0341 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f484a6f Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12745/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 14m 53s trunk passed
          +1 compile 0m 21s trunk passed
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 23s trunk passed
          +1 findbugs 0m 35s trunk passed
          +1 javadoc 0m 15s trunk passed
          +1 mvninstall 0m 19s the patch passed
          +1 compile 0m 17s the patch passed
          +1 javac 0m 17s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 75 new + 111 unchanged - 4 fixed = 186 total (was 115)
          +1 mvnsite 0m 20s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 0m 38s hadoop-tools/hadoop-azure generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 0m 12s hadoop-azure in the patch failed.
          +1 unit 1m 28s hadoop-azure in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          22m 2s



          Reason Tests
          FindBugs module:hadoop-tools/hadoop-azure
            org.apache.hadoop.fs.azure.BlockBlobAppendStream.commitAppendBlocks() calls Thread.sleep() with a lock held At BlockBlobAppendStream.java:lock held At BlockBlobAppendStream.java:[line 521]
            org.apache.hadoop.fs.azure.BlockBlobAppendStream.getBufferSize() is unsynchronized, org.apache.hadoop.fs.azure.BlockBlobAppendStream.setBufferSize(int) is synchronized At BlockBlobAppendStream.java:synchronized At BlockBlobAppendStream.java:[line 302]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873986/HADOOP-14520-03.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux b1786b7fc93a 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c22cf00
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 14m 53s trunk passed +1 compile 0m 21s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 23s trunk passed +1 findbugs 0m 35s trunk passed +1 javadoc 0m 15s trunk passed +1 mvninstall 0m 19s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 75 new + 111 unchanged - 4 fixed = 186 total (was 115) +1 mvnsite 0m 20s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 0m 38s hadoop-tools/hadoop-azure generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 0m 12s hadoop-azure in the patch failed. +1 unit 1m 28s hadoop-azure in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 22m 2s Reason Tests FindBugs module:hadoop-tools/hadoop-azure   org.apache.hadoop.fs.azure.BlockBlobAppendStream.commitAppendBlocks() calls Thread.sleep() with a lock held At BlockBlobAppendStream.java:lock held At BlockBlobAppendStream.java: [line 521]   org.apache.hadoop.fs.azure.BlockBlobAppendStream.getBufferSize() is unsynchronized, org.apache.hadoop.fs.azure.BlockBlobAppendStream.setBufferSize(int) is synchronized At BlockBlobAppendStream.java:synchronized At BlockBlobAppendStream.java: [line 302] Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873986/HADOOP-14520-03.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b1786b7fc93a 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c22cf00 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12595/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 15m 19s trunk passed
          +1 compile 0m 19s trunk passed
          +1 checkstyle 0m 17s trunk passed
          +1 mvnsite 0m 24s trunk passed
          +1 findbugs 0m 34s trunk passed
          +1 javadoc 0m 15s trunk passed
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 18s the patch passed
          +1 javac 0m 18s the patch passed
          -0 checkstyle 0m 14s hadoop-tools/hadoop-azure: The patch generated 76 new + 111 unchanged - 4 fixed = 187 total (was 115)
          +1 mvnsite 0m 18s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 0m 42s hadoop-tools/hadoop-azure generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 0m 14s hadoop-azure in the patch failed.
          +1 unit 1m 41s hadoop-azure in the patch passed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          22m 52s



          Reason Tests
          FindBugs module:hadoop-tools/hadoop-azure
            org.apache.hadoop.fs.azure.BlockBlobAppendStream.commitAppendBlocks() calls Thread.sleep() with a lock held At BlockBlobAppendStream.java:lock held At BlockBlobAppendStream.java:[line 522]
            org.apache.hadoop.fs.azure.BlockBlobAppendStream.getBufferSize() is unsynchronized, org.apache.hadoop.fs.azure.BlockBlobAppendStream.setBufferSize(int) is synchronized At BlockBlobAppendStream.java:synchronized At BlockBlobAppendStream.java:[line 303]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue HADOOP-14520
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873970/HADOOP-14520-02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 858c162f95e0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c22cf00
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
          whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html
          javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/testReport/
          modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 15m 19s trunk passed +1 compile 0m 19s trunk passed +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 24s trunk passed +1 findbugs 0m 34s trunk passed +1 javadoc 0m 15s trunk passed +1 mvninstall 0m 18s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed -0 checkstyle 0m 14s hadoop-tools/hadoop-azure: The patch generated 76 new + 111 unchanged - 4 fixed = 187 total (was 115) +1 mvnsite 0m 18s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 0m 42s hadoop-tools/hadoop-azure generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 0m 14s hadoop-azure in the patch failed. +1 unit 1m 41s hadoop-azure in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 22m 52s Reason Tests FindBugs module:hadoop-tools/hadoop-azure   org.apache.hadoop.fs.azure.BlockBlobAppendStream.commitAppendBlocks() calls Thread.sleep() with a lock held At BlockBlobAppendStream.java:lock held At BlockBlobAppendStream.java: [line 522]   org.apache.hadoop.fs.azure.BlockBlobAppendStream.getBufferSize() is unsynchronized, org.apache.hadoop.fs.azure.BlockBlobAppendStream.setBufferSize(int) is synchronized At BlockBlobAppendStream.java:synchronized At BlockBlobAppendStream.java: [line 303] Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14520 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873970/HADOOP-14520-02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 858c162f95e0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c22cf00 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html javadoc https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/artifact/patchprocess/patch-javadoc-hadoop-tools_hadoop-azure.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/testReport/ modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12593/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            People

            • Assignee:
              Georgi Georgi Chalakov
              Reporter:
              Georgi Georgi Chalakov
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development