Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9260

Improve the performance and GC friendliness of NameNode startup and full block reports

    Details

    • Target Version/s:

      Description

      This patch changes the datastructures used for BlockInfos and Replicas to keep them sorted. This allows faster and more GC friendly handling of full block reports.

      Would like to hear peoples feedback on this change.

      1. FBR processing.png
        179 kB
        Daryn Sharp
      2. HDFS-7435.001.patch
        105 kB
        Staffan Friberg
      3. HDFS-7435.002.patch
        104 kB
        Staffan Friberg
      4. HDFS-7435.003.patch
        104 kB
        Staffan Friberg
      5. HDFS-7435.004.patch
        105 kB
        Staffan Friberg
      6. HDFS-7435.005.patch
        144 kB
        Staffan Friberg
      7. HDFS-7435.006.patch
        144 kB
        Staffan Friberg
      8. HDFS-7435.007.patch
        145 kB
        Staffan Friberg
      9. HDFS-9260.008.patch
        145 kB
        Staffan Friberg
      10. HDFS-9260.009.patch
        145 kB
        Staffan Friberg
      11. HDFS-9260.010.patch
        153 kB
        Staffan Friberg
      12. HDFS-9260.011.patch
        154 kB
        Staffan Friberg
      13. HDFS-9260.012.patch
        150 kB
        Staffan Friberg
      14. HDFS-9260.013.patch
        150 kB
        Staffan Friberg
      15. HDFS-9260.014.patch
        150 kB
        Staffan Friberg
      16. HDFS-9260.015.patch
        150 kB
        Staffan Friberg
      17. HDFS-9260.016.patch
        146 kB
        Staffan Friberg
      18. HDFS-9260.017.patch
        148 kB
        Staffan Friberg
      19. HDFS-9260.018.patch
        147 kB
        Staffan Friberg
      20. HDFSBenchmarks.zip
        25 kB
        Staffan Friberg
      21. HDFSBenchmarks2.zip
        32 kB
        Staffan Friberg
      22. HDFS Block and Replica Management 20151013.pdf
        143 kB
        Staffan Friberg

        Issue Links

          Activity

          Hide
          sfriberg Staffan Friberg added a comment -

          Merged with latest head

          Show
          sfriberg Staffan Friberg added a comment - Merged with latest head
          Hide
          sfriberg Staffan Friberg added a comment -

          Add null check when creating iterator of storages

          Show
          sfriberg Staffan Friberg added a comment - Add null check when creating iterator of storages
          Hide
          walter.k.su Walter Su added a comment -

          Excellent work Staffan Friberg!
          1. Per your tests, the result looks good. But I'm more interested how does the new datastructure incorporate with HDFS-7836.
          2. To avoid moving entries in DatanodeStorageInfo, an easy way is to make blockReport a serializable HashMap. Instead of iterating blockReport, we iterate DatanodeStorageInfo with O(1) comparing block from blockReport.

          Show
          walter.k.su Walter Su added a comment - Excellent work Staffan Friberg ! 1. Per your tests, the result looks good. But I'm more interested how does the new datastructure incorporate with HDFS-7836 . 2. To avoid moving entries in DatanodeStorageInfo , an easy way is to make blockReport a serializable HashMap. Instead of iterating blockReport, we iterate DatanodeStorageInfo with O(1) comparing block from blockReport.
          Hide
          sfriberg Staffan Friberg added a comment -

          Hi Walter

          1)
          I think this is part could reduce the need to go off heap. If we still see scalability issues and need to put the block map and blockinfo off heap, this could potenially serve as an idea on how to structure the data of heap, since even if the data is off heap continuously updating reference will be costly since we will invalidate the CPU cache. Potentially a version of the TreeSet holding primitives (blockinfo address) could be used for fast iteration, but need to think a bit further about.

          2)
          Interesting idea, I think the key point would be that you could do quick lookup directly on the serialized data so you don't need to instantiate the whole map since it might be rather large. Not sure if this is easily doable with ProtoBuf and still keeping the message as compact as possible?

          Show
          sfriberg Staffan Friberg added a comment - Hi Walter 1) I think this is part could reduce the need to go off heap. If we still see scalability issues and need to put the block map and blockinfo off heap, this could potenially serve as an idea on how to structure the data of heap, since even if the data is off heap continuously updating reference will be costly since we will invalidate the CPU cache. Potentially a version of the TreeSet holding primitives (blockinfo address) could be used for fast iteration, but need to think a bit further about. 2) Interesting idea, I think the key point would be that you could do quick lookup directly on the serialized data so you don't need to instantiate the whole map since it might be rather large. Not sure if this is easily doable with ProtoBuf and still keeping the message as compact as possible?
          Hide
          walter.k.su Walter Su added a comment -

          The biggest part of BlockInfo is the implicit linked lists. If we move the lists to TreeSet, Does it go against the effort of moving BlockInfo off-heap? The TreeMap brings more on-heap reference to off-heap BlockInfo. BlocksMap is a reference map still need to be off-heap.(HDFS-7846). I think TreeMap need to be an off-heap set as well. Since set/map have the same numbers of blocks, and block number grows. But does it worth it?

          The point of TreeSet is for fast iteration for processing FBR. As said by Colin P. McCabe in here, iterating over all the blocks on a given datanode isn't quite necessary during FBR. Although iteration still needed in other use-cases.

          Show
          walter.k.su Walter Su added a comment - The biggest part of BlockInfo is the implicit linked lists. If we move the lists to TreeSet, Does it go against the effort of moving BlockInfo off-heap? The TreeMap brings more on-heap reference to off-heap BlockInfo. BlocksMap is a reference map still need to be off-heap.( HDFS-7846 ). I think TreeMap need to be an off-heap set as well. Since set/map have the same numbers of blocks, and block number grows. But does it worth it? The point of TreeSet is for fast iteration for processing FBR. As said by Colin P. McCabe in here , iterating over all the blocks on a given datanode isn't quite necessary during FBR. Although iteration still needed in other use-cases.
          Hide
          walter.k.su Walter Su added a comment -

          TreeMap TreeSet

          Show
          walter.k.su Walter Su added a comment - TreeMap TreeSet
          Hide
          sfriberg Staffan Friberg added a comment -

          The TreeSet reduces the number of reference compared to the Double-LinkedList currently built using the the triplets datastructure.

          If we would move things out of the heap the TreeSet, if still used, would contain memory addresses rather than longs which are trivial for the GC to handle (no need to scan the array). Potentially the BlockMap could be the same way a large long array on heap that contains memory addresses of the blockinfos that are off heap, and collisions could be handled by the blockinfo's off heap (linking in the same way they are now).

          Show
          sfriberg Staffan Friberg added a comment - The TreeSet reduces the number of reference compared to the Double-LinkedList currently built using the the triplets datastructure. If we would move things out of the heap the TreeSet, if still used, would contain memory addresses rather than longs which are trivial for the GC to handle (no need to scan the array). Potentially the BlockMap could be the same way a large long array on heap that contains memory addresses of the blockinfos that are off heap, and collisions could be handled by the blockinfo's off heap (linking in the same way they are now).
          Hide
          sfriberg Staffan Friberg added a comment -

          All HDFS tests should now pass.

          Fixes include

          Correctly handle EC blocks which are negative, and need to be masked
          Initial report might sometime report into a storage containing new blocks reported by incremental block report, addStoredLast will fallback on regular add if not sorted
          remove debugging output
          remove unused GSet import
          invalidate list must be a Block and not a Replica

          Left todo
          Handle old nodes which don't send data sorted
          Add a 'sorted' field in the report PB
          Figure out how how to be able to handle reports when cluster contains negative entries that are not EC blocks

          Show
          sfriberg Staffan Friberg added a comment - All HDFS tests should now pass. Fixes include Correctly handle EC blocks which are negative, and need to be masked Initial report might sometime report into a storage containing new blocks reported by incremental block report, addStoredLast will fallback on regular add if not sorted remove debugging output remove unused GSet import invalidate list must be a Block and not a Replica Left todo Handle old nodes which don't send data sorted Add a 'sorted' field in the report PB Figure out how how to be able to handle reports when cluster contains negative entries that are not EC blocks
          Hide
          sfriberg Staffan Friberg added a comment -

          Fix the last todos

          Handles New NN and Old DN (unsorted entries), it is ineffiecient since the NN needs to sort entries. However it should only be a problem during the upgrade cycle, and avoidable if DNs are updated first.

          StorageInfoMonitor thread that can compact the TreeSet if the fill ratio gets too low.

          Added test to check that unsorted entries are handled correctly.

          Show
          sfriberg Staffan Friberg added a comment - Fix the last todos Handles New NN and Old DN (unsorted entries), it is ineffiecient since the NN needs to sort entries. However it should only be a problem during the upgrade cycle, and avoidable if DNs are updated first. StorageInfoMonitor thread that can compact the TreeSet if the fill ratio gets too low. Added test to check that unsorted entries are handled correctly.
          Hide
          sfriberg Staffan Friberg added a comment -

          Also handles negative non-striped (EC) entries as efficiently as possible.

          Show
          sfriberg Staffan Friberg added a comment - Also handles negative non-striped (EC) entries as efficiently as possible.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12768368/HDFS-7435.005.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 15eb84b
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13167/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768368/HDFS-7435.005.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 15eb84b Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13167/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Merged and diff:ed again due to conflict

          Show
          sfriberg Staffan Friberg added a comment - Merged and diff:ed again due to conflict
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 18m 12s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
          -1 @author 0m 0s The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions.
          +1 tests included 0m 0s The patch appears to include 19 new or modified test files.
          +1 javac 7m 55s There were no new javac warning messages.
          -1 javadoc 10m 20s The applied patch generated 3 additional warning messages.
          +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 25s The applied patch generated 77 new checkstyle issues (total was 883, now 952).
          -1 whitespace 0m 33s The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          -1 findbugs 2m 33s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings.
          +1 native 3m 9s Pre-build of native portion
          -1 hdfs tests 54m 34s Tests failed in hadoop-hdfs.
              101m 17s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.server.balancer.TestBalancer
          Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12768437/HDFS-7435.006.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 15eb84b
          Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffJavadocWarnings.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/whitespace.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13175/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13175/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 18m 12s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. -1 @author 0m 0s The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. +1 tests included 0m 0s The patch appears to include 19 new or modified test files. +1 javac 7m 55s There were no new javac warning messages. -1 javadoc 10m 20s The applied patch generated 3 additional warning messages. +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 25s The applied patch generated 77 new checkstyle issues (total was 883, now 952). -1 whitespace 0m 33s The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. -1 findbugs 2m 33s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. +1 native 3m 9s Pre-build of native portion -1 hdfs tests 54m 34s Tests failed in hadoop-hdfs.     101m 17s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.server.balancer.TestBalancer Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768437/HDFS-7435.006.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 15eb84b Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffJavadocWarnings.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13175/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13175/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Fixed comments, white spaces and most of the 80 width warnings

          Show
          sfriberg Staffan Friberg added a comment - Fixed comments, white spaces and most of the 80 width warnings
          Hide
          zhz Zhe Zhang added a comment -

          Quick note: the patches are all named after HDFS-7435

          Show
          zhz Zhe Zhang added a comment - Quick note: the patches are all named after HDFS-7435
          Hide
          zhz Zhe Zhang added a comment -

          Thanks for the great work Staffan.

          The only change related to erasure coding is the below block ID translation, and I think it is done correctly.

          +      long replicaID = replica.getBlockId();
          +      if (BlockIdManager.isStripedBlockID(replicaID)
          +          && (!hasNonEcBlockUsingStripedID ||
          +              !blocksMap.containsBlock(replica))) {
          +        replicaID = BlockIdManager.convertToStripedID(replicaID);
                 }
          

          Will post a full review shortly.

          Show
          zhz Zhe Zhang added a comment - Thanks for the great work Staffan. The only change related to erasure coding is the below block ID translation, and I think it is done correctly. + long replicaID = replica.getBlockId(); + if (BlockIdManager.isStripedBlockID(replicaID) + && (!hasNonEcBlockUsingStripedID || + !blocksMap.containsBlock(replica))) { + replicaID = BlockIdManager.convertToStripedID(replicaID); } Will post a full review shortly.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 29m 54s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 19 new or modified test files.
          +1 javac 12m 32s There were no new javac warning messages.
          +1 javadoc 14m 36s There were no new javadoc warning messages.
          +1 release audit 0m 33s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 3s The applied patch generated 21 new checkstyle issues (total was 883, now 892).
          -1 whitespace 0m 45s The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 2m 11s mvn install still works.
          +1 eclipse:eclipse 0m 55s The patch built with eclipse:eclipse.
          -1 findbugs 3m 8s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings.
          +1 native 3m 36s Pre-build of native portion
          -1 hdfs tests 57m 38s Tests failed in hadoop-hdfs.
              128m 13s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.TestReplaceDatanodeOnFailure
          Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12768759/HDFS-7435.007.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 123b3db
          Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/whitespace.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13195/testReport/
          Java 1.7.0_55
          uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13195/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 29m 54s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 19 new or modified test files. +1 javac 12m 32s There were no new javac warning messages. +1 javadoc 14m 36s There were no new javadoc warning messages. +1 release audit 0m 33s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 3s The applied patch generated 21 new checkstyle issues (total was 883, now 892). -1 whitespace 0m 45s The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 2m 11s mvn install still works. +1 eclipse:eclipse 0m 55s The patch built with eclipse:eclipse. -1 findbugs 3m 8s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. +1 native 3m 36s Pre-build of native portion -1 hdfs tests 57m 38s Tests failed in hadoop-hdfs.     128m 13s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.TestReplaceDatanodeOnFailure Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768759/HDFS-7435.007.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 123b3db Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/whitespace.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13195/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13195/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Using the right name of the bug on the patch...
          Fixed white spaces and findbugs

          The remaining should hopefully be OK.

          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java:60:35: Variable 'storages' must be private and have accessor methods.
          Same as triplets was before

          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:204:16: Variable 'storageInfoMonitorThread' must be private and have accessor methods.
          Same as replicationMonitor

          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:1: File length is 4,427 lines (max allowed is 2,000).
          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2487: Comment matches to-do format 'TODO:'.
          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2501: Comment matches to-do format 'TODO:'.
          File was long before already, and the TODOs are kept from the earlier version of diffReport

          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:19: Inner assignments should be avoided.
          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:28: Inner assignments should be avoided.
          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:35: Inner assignments should be avoided.
          ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:43: Inner assignments should be avoided.
          Can change to separate lines writing null, but the current version is more compact in the clear method setting them all to null

          Show
          sfriberg Staffan Friberg added a comment - Using the right name of the bug on the patch... Fixed white spaces and findbugs The remaining should hopefully be OK. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java:60:35: Variable 'storages' must be private and have accessor methods. Same as triplets was before ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:204:16: Variable 'storageInfoMonitorThread' must be private and have accessor methods. Same as replicationMonitor ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:1: File length is 4,427 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2487: Comment matches to-do format 'TODO:'. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2501: Comment matches to-do format 'TODO:'. File was long before already, and the TODOs are kept from the earlier version of diffReport ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:19: Inner assignments should be avoided. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:28: Inner assignments should be avoided. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:35: Inner assignments should be avoided. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:43: Inner assignments should be avoided. Can change to separate lines writing null, but the current version is more compact in the clear method setting them all to null
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 20m 25s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 19 new or modified test files.
          +1 javac 8m 2s There were no new javac warning messages.
          +1 javadoc 10m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 25s The applied patch generated 9 new checkstyle issues (total was 882, now 878).
          +1 whitespace 0m 36s The patch has no lines that end in whitespace.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          -1 findbugs 2m 36s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings.
          +1 native 3m 14s Pre-build of native portion
          -1 hdfs tests 53m 59s Tests failed in hadoop-hdfs.
              103m 27s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.hdfs.server.namenode.ha.TestHASafeMode
            hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
          Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12768859/HDFS-9260.008.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 96677be
          Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13214/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13214/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 20m 25s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 19 new or modified test files. +1 javac 8m 2s There were no new javac warning messages. +1 javadoc 10m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 25s The applied patch generated 9 new checkstyle issues (total was 882, now 878). +1 whitespace 0m 36s The patch has no lines that end in whitespace. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. -1 findbugs 2m 36s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. +1 native 3m 14s Pre-build of native portion -1 hdfs tests 53m 59s Tests failed in hadoop-hdfs.     103m 27s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints   hadoop.hdfs.TestDistributedFileSystem   hadoop.hdfs.server.namenode.ha.TestHASafeMode   hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate Timed out tests org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768859/HDFS-9260.008.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 96677be Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13214/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13214/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Fix for timed out test org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks

          Need to remove from iterator and not from tree during iteration to avoid concurrent modification exception

          Show
          sfriberg Staffan Friberg added a comment - Fix for timed out test org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks Need to remove from iterator and not from tree during iteration to avoid concurrent modification exception
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 25m 50s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 19 new or modified test files.
          +1 javac 11m 12s There were no new javac warning messages.
          +1 javadoc 14m 24s There were no new javadoc warning messages.
          +1 release audit 0m 33s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 58s The applied patch generated 9 new checkstyle issues (total was 882, now 878).
          +1 whitespace 0m 42s The patch has no lines that end in whitespace.
          +1 install 2m 4s mvn install still works.
          +1 eclipse:eclipse 0m 45s The patch built with eclipse:eclipse.
          -1 findbugs 3m 40s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings.
          +1 native 4m 21s Pre-build of native portion
          -1 hdfs tests 69m 7s Tests failed in hadoop-hdfs.
              134m 43s  



          Reason Tests
          FindBugs module:hadoop-hdfs
          Failed unit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
            hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
            hadoop.hdfs.server.namenode.TestFSImageWithAcl
            hadoop.hdfs.TestReplaceDatanodeOnFailure
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.TestEncryptionZones



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12769065/HDFS-9260.009.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 68ce93c
          Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13233/testReport/
          Java 1.7.0_55
          uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13233/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 25m 50s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 19 new or modified test files. +1 javac 11m 12s There were no new javac warning messages. +1 javadoc 14m 24s There were no new javadoc warning messages. +1 release audit 0m 33s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 58s The applied patch generated 9 new checkstyle issues (total was 882, now 878). +1 whitespace 0m 42s The patch has no lines that end in whitespace. +1 install 2m 4s mvn install still works. +1 eclipse:eclipse 0m 45s The patch built with eclipse:eclipse. -1 findbugs 3m 40s The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. +1 native 4m 21s Pre-build of native portion -1 hdfs tests 69m 7s Tests failed in hadoop-hdfs.     134m 43s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles   hadoop.hdfs.server.namenode.TestFSImageWithAcl   hadoop.hdfs.TestReplaceDatanodeOnFailure   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.TestEncryptionZones Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12769065/HDFS-9260.009.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 68ce93c Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13233/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13233/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          I have been running through the other failed tests without being able to reproduce them locally. Was able to reproduce the failed test in hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes once on the trunk, but not yet with my branch. So it seems like these might be intermittent issues.

          Show
          sfriberg Staffan Friberg added a comment - I have been running through the other failed tests without being able to reproduce them locally. Was able to reproduce the failed test in hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes once on the trunk, but not yet with my branch. So it seems like these might be intermittent issues.
          Hide
          daryn Daryn Sharp added a comment -

          I'll try to review the patch today. Only skimmed comments, and it's a big change. My initial questions:

          1. Performance. What impact does it have on FBRs? Especially startup.
          2. Time to initialize replication queue.
          3. Time to decommission.
          4. Does memory usage increase or decrease?
          Show
          daryn Daryn Sharp added a comment - I'll try to review the patch today. Only skimmed comments, and it's a big change. My initial questions: Performance. What impact does it have on FBRs? Especially startup. Time to initialize replication queue. Time to decommission. Does memory usage increase or decrease?
          Hide
          sfriberg Staffan Friberg added a comment -

          Hi Daryn,

          Thanks for taking a look at the patch.

          1. FBR and startup improves, please see the attached PDF.
          2. Will need to check what we do here (and if I still have the old logs), but doesn't feel like it should be affected
          3. We will be slightly slower when deleting a file or removing with the current algorithms as it goes through the LightWeightGSet to first lookup/remove each affected blockinfo, and after that remove it from the linked list. In my case it will be removed from treeset which requires a new lookup. However while this is slower I think the time it takes to that process is far outweighed by the time it takes for deleting or redistributing blocks on all DN. Deleting files with a large number of blocks seems to take on the order of hours since we only send small parts of the total block list to each node on every heartbeat. No to familiar with how aggressive the redistribution is in the event of a DN decommission.
          4. It will decrease as long as the TreeSet is kept above ~50% fill ratio, since the reference to each blockinfo no is a single pointer from the treeset instead of the double linked list.

          Show
          sfriberg Staffan Friberg added a comment - Hi Daryn, Thanks for taking a look at the patch. 1. FBR and startup improves, please see the attached PDF. 2. Will need to check what we do here (and if I still have the old logs), but doesn't feel like it should be affected 3. We will be slightly slower when deleting a file or removing with the current algorithms as it goes through the LightWeightGSet to first lookup/remove each affected blockinfo, and after that remove it from the linked list. In my case it will be removed from treeset which requires a new lookup. However while this is slower I think the time it takes to that process is far outweighed by the time it takes for deleting or redistributing blocks on all DN. Deleting files with a large number of blocks seems to take on the order of hours since we only send small parts of the total block list to each node on every heartbeat. No to familiar with how aggressive the redistribution is in the event of a DN decommission. 4. It will decrease as long as the TreeSet is kept above ~50% fill ratio, since the reference to each blockinfo no is a single pointer from the treeset instead of the double linked list.
          Hide
          daryn Daryn Sharp added a comment - - edited

          I've read the doc now. Sorry I commented before doing so. The results are interesting until the the final details about a 4x reduction in block updates. Here are some basic specs to consider:

          • 10-80k adds/min
          • job submissions increasing replication factor to 10
          • at least 1 node/day decommissioning or going dead with 100k-400k blocks
          • every few weeks entire racks (40 nodes) are decommissioned for refresh or reallocation
          • balancer is constantly churning to populate recommissioned dead nodes

          That's a lot of IBRs which is why a 4x degradation is quite concerning. The block report processing times seem a bit high in the tests. I'll attach an image of the BR processing times for some of our busiest clusters. They span the gamut from 100M-300M blocks with roughly the same number of files. We got a huge improvement from my BR encoding change + per-storage reports.

          BTW, I had/have a working patch that replaced the triplets with sparse yet densely packed 2-dimensional primitive arrays. Everything is linked via indices to a greatly reduce the dirty cards to scan. Need to dig up the jira when my head is above water.

          Show
          daryn Daryn Sharp added a comment - - edited I've read the doc now. Sorry I commented before doing so. The results are interesting until the the final details about a 4x reduction in block updates. Here are some basic specs to consider: 10-80k adds/min job submissions increasing replication factor to 10 at least 1 node/day decommissioning or going dead with 100k-400k blocks every few weeks entire racks (40 nodes) are decommissioned for refresh or reallocation balancer is constantly churning to populate recommissioned dead nodes That's a lot of IBRs which is why a 4x degradation is quite concerning. The block report processing times seem a bit high in the tests. I'll attach an image of the BR processing times for some of our busiest clusters. They span the gamut from 100M-300M blocks with roughly the same number of files. We got a huge improvement from my BR encoding change + per-storage reports. BTW, I had/have a working patch that replaced the triplets with sparse yet densely packed 2-dimensional primitive arrays. Everything is linked via indices to a greatly reduce the dirty cards to scan. Need to dig up the jira when my head is above water.
          Hide
          sfriberg Staffan Friberg added a comment -

          Hi Daryn,

          Thanks for the comments and the additional data points. Interesting to learn more about the scale of HDFS instances. I wonder if the NN was running on older and slower hardware in my case compared to your setup, the cluster I was able to get my hands on for these runs has fairly old machines.

          Adds of new blocks are relatively fast since they will be at the far right of the Tree the number of lookups will be minimal. However the current implementation only needs to do around two writes to insert something at the head/end of the list nothing that has a more complicated datastructure will be able to match it. It will be a question of trade-off.

          Also to clarify, the microbenchmarks only measures the actual remove and insert of random values not the whole process of copying files etc. I would expect the other parts to far outweigh the time it takes to update the datastructures, so while the 4x sounds scary it should be a minor part of the whole transaction.

          I think the patch you are referring to is HDFS-6658. I applied it to the 3.0.0 branch from March 11 2015 which was from when the patch was created and ran it on the same microbenchmarks I built to test my patch. I will attach the source code for the benchmarks so you can check that I used the right APIs for it to be comparable. From what I can tell the benchmarks should do the same thing on a high level. The performance overhead for adding and removing are similar between our two implementations.

          fbrAllExisting  - Do a Full Block Report with the same 2M entries that are already registered for the Storage in the NN.
          addRemoveBulk   - Remove 32k random blocks from a StorageInfo that has 64k entries, then re-add them all.
          addRemoveRandom - Remove and directly re-add a block from a Storage entry, repeat for 32k blocks from a StorageInfo with 64k blocks
          iterate         - Iterate and get blockID for 64k blocks associated with a particular StorageInfo
          
          ==> benchmarks_trunkMarch11_intMapping.jar.output <==
          Benchmark                          Mode  Cnt    Score   Error  Units
          FullBlockReport.fbrAllExisting     avgt   25  379.659 ± 5.463  ms/op
          StorageInfoAccess.addRemoveBulk    avgt   25   16.426 ± 0.380  ms/op
          StorageInfoAccess.addRemoveRandom  avgt   25   15.401 ± 0.196  ms/op
          StorageInfoAccess.iterate          avgt   25    1.496 ± 0.004  ms/op
          
          ==> benchmarks_trunk_baseline.jar.output <==
          Benchmark                          Mode  Cnt    Score   Error  Units
          FullBlockReport.fbrAllExisting     avgt   25  288.974 ± 3.970  ms/op
          StorageInfoAccess.addRemoveBulk    avgt   25    3.157 ± 0.046  ms/op
          StorageInfoAccess.addRemoveRandom  avgt   25    2.815 ± 0.012  ms/op
          StorageInfoAccess.iterate          avgt   25    0.788 ± 0.006  ms/op
          
          ==> benchmarks_trunk_treeset.jar.output <==
          Benchmark                          Mode  Cnt    Score   Error  Units
          FullBlockReport.fbrAllExisting     avgt   25  231.270 ± 3.450  ms/op
          StorageInfoAccess.addRemoveBulk    avgt   25   11.596 ± 0.521  ms/op
          StorageInfoAccess.addRemoveRandom  avgt   25   11.249 ± 0.101  ms/op
          StorageInfoAccess.iterate          avgt   25    0.385 ± 0.010  ms/op
          

          Do you have a good suggestion for some other perf test/stress test that would be good to try out? Any stress load you have on your end that would be possible to try it out on?

          Show
          sfriberg Staffan Friberg added a comment - Hi Daryn, Thanks for the comments and the additional data points. Interesting to learn more about the scale of HDFS instances. I wonder if the NN was running on older and slower hardware in my case compared to your setup, the cluster I was able to get my hands on for these runs has fairly old machines. Adds of new blocks are relatively fast since they will be at the far right of the Tree the number of lookups will be minimal. However the current implementation only needs to do around two writes to insert something at the head/end of the list nothing that has a more complicated datastructure will be able to match it. It will be a question of trade-off. Also to clarify, the microbenchmarks only measures the actual remove and insert of random values not the whole process of copying files etc. I would expect the other parts to far outweigh the time it takes to update the datastructures, so while the 4x sounds scary it should be a minor part of the whole transaction. I think the patch you are referring to is HDFS-6658 . I applied it to the 3.0.0 branch from March 11 2015 which was from when the patch was created and ran it on the same microbenchmarks I built to test my patch. I will attach the source code for the benchmarks so you can check that I used the right APIs for it to be comparable. From what I can tell the benchmarks should do the same thing on a high level. The performance overhead for adding and removing are similar between our two implementations. fbrAllExisting - Do a Full Block Report with the same 2M entries that are already registered for the Storage in the NN. addRemoveBulk - Remove 32k random blocks from a StorageInfo that has 64k entries, then re-add them all. addRemoveRandom - Remove and directly re-add a block from a Storage entry, repeat for 32k blocks from a StorageInfo with 64k blocks iterate - Iterate and get blockID for 64k blocks associated with a particular StorageInfo ==> benchmarks_trunkMarch11_intMapping.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 379.659 ± 5.463 ms/op StorageInfoAccess.addRemoveBulk avgt 25 16.426 ± 0.380 ms/op StorageInfoAccess.addRemoveRandom avgt 25 15.401 ± 0.196 ms/op StorageInfoAccess.iterate avgt 25 1.496 ± 0.004 ms/op ==> benchmarks_trunk_baseline.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 288.974 ± 3.970 ms/op StorageInfoAccess.addRemoveBulk avgt 25 3.157 ± 0.046 ms/op StorageInfoAccess.addRemoveRandom avgt 25 2.815 ± 0.012 ms/op StorageInfoAccess.iterate avgt 25 0.788 ± 0.006 ms/op ==> benchmarks_trunk_treeset.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 231.270 ± 3.450 ms/op StorageInfoAccess.addRemoveBulk avgt 25 11.596 ± 0.521 ms/op StorageInfoAccess.addRemoveRandom avgt 25 11.249 ± 0.101 ms/op StorageInfoAccess.iterate avgt 25 0.385 ± 0.010 ms/op Do you have a good suggestion for some other perf test/stress test that would be good to try out? Any stress load you have on your end that would be possible to try it out on?
          Hide
          sfriberg Staffan Friberg added a comment -

          Microbenchmarks

          Show
          sfriberg Staffan Friberg added a comment - Microbenchmarks
          Hide
          sfriberg Staffan Friberg added a comment -

          Added a new benchmark that does IBRs on a blockmap/datastorageinfo that contains 2M entries and deletes/re-adds 20% of those entries. The updates are spread out over multiple IBRs and each IBR contains between 50-350 changed blocks. The IntMapping version is again the patch from HDFS-6658.

          Some further benchmarking of Incremental BR.
          
          ==> benchmarks_trunkMarch11_intMapping.jar.output <==
          
          Benchmark                                  Mode  Cnt     Score    Error  Units
          IncrementalBlockReport.receivedAndDeleted  avgt   50  3969.207 ± 14.979  ms/op
          
          ==> benchmarks_treeset_baseline.jar.output <==
          
          Benchmark                                  Mode  Cnt    Score    Error  Units
          IncrementalBlockReport.receivedAndDeleted  avgt   50  387.936 ± 25.634  ms/op
          
          ==> benchmarks_treeset.jar.output <==
          
          Benchmark                                  Mode  Cnt     Score    Error  Units
          IncrementalBlockReport.receivedAndDeleted  avgt   50  1205.779 ± 75.464  ms/op
          
          Show
          sfriberg Staffan Friberg added a comment - Added a new benchmark that does IBRs on a blockmap/datastorageinfo that contains 2M entries and deletes/re-adds 20% of those entries. The updates are spread out over multiple IBRs and each IBR contains between 50-350 changed blocks. The IntMapping version is again the patch from HDFS-6658 . Some further benchmarking of Incremental BR. ==> benchmarks_trunkMarch11_intMapping.jar.output <== Benchmark Mode Cnt Score Error Units IncrementalBlockReport.receivedAndDeleted avgt 50 3969.207 ± 14.979 ms/op ==> benchmarks_treeset_baseline.jar.output <== Benchmark Mode Cnt Score Error Units IncrementalBlockReport.receivedAndDeleted avgt 50 387.936 ± 25.634 ms/op ==> benchmarks_treeset.jar.output <== Benchmark Mode Cnt Score Error Units IncrementalBlockReport.receivedAndDeleted avgt 50 1205.779 ± 75.464 ms/op
          Hide
          sfriberg Staffan Friberg added a comment -

          Avoid LinkedList allocations

          Show
          sfriberg Staffan Friberg added a comment - Avoid LinkedList allocations
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for working on this, Staffan Friberg. It looks promising.

           >   @Override
           >   boolean removeStorage(DatanodeStorageInfo storage) {
           >     int dnIndex = findStorageInfoFromEnd(storage);
           >     if (dnIndex < 0) { // the node is not found
           >       return false;
           >     }
           >     // set the triplet to null
           >     setStorageInfo(dnIndex, null);
           >     indices[dnIndex] = -1;
           >     return true;
           >   }
          

          This still refers to "the triplet" but there are no more triplets, right? There are some other comments referencing "triplets" in BlockInfoStriped that should be fixed as well.

          What is the strategy for shrinking BlockInfo#storages? It seems like right now setStorageInfo(<index>, null) will create a "hole" in the array, but it is never actually shrunk.

           >      // Remove here for now as removeStoredBlock will do it otherwise
           >      // and cause concurrent modification exception
          

          This comment could be clearer. How about "we must remove the block via the iterator"?

           >       if (shouldPostponeBlocksFromFuture) {
           >         // If the block is an out-of-date generation stamp or state,
           >         // but we're the standby, we shouldn't treat it as corrupt,
           >         // but instead just queue it for later processing.
           >         // TODO: Pretty confident this should be s/storedBlock/block below,
           >         // since we should be postponing the info of the reported block, not
           >         // the stored block. See HDFS-6289 for more context.
           >         queueReportedBlock(storageInfo, storedBlock, reportedState,
           >             QUEUE_REASON_CORRUPT_STATE);
           >       } else {
          

          If we're really confident that this should be "block" rather than "storedBlock", let's fix it.

          @@ -122,8 +91,9 @@ BlockInfo addBlockCollection(BlockInfo b, BlockCollection bc) {
              */
             void removeBlock(Block block) {
               BlockInfo blockInfo = blocks.remove(block);
          -    if (blockInfo == null)
          +    if (blockInfo == null) {
                 return;
          +    }
          ...
          @@ -191,8 +177,9 @@ int numNodes(Block b) {
              */
             boolean removeNode(Block b, DatanodeDescriptor node) {
               BlockInfo info = blocks.get(b);
          -    if (info == null)
          +    if (info == null) {
                 return false;
          +    }
          

          Let's try to avoid "no-op" changes like this in this patch, since it's already pretty big. We can fix whitespace and so forth in other JIRAs to avoid creating confusion about what was changed here.

                return set != null ? set.get(blockId, LONG_AND_BLOCK_COMPARATOR) : null;
          

          This might be simpler as:

          if (set == null) {
              return null;
          }
          return set.get(blockId, LONG_AND_BLOCK_COMPARATOR);
          
             /**
              * Add a replica's meta information into the map
              *
              * @param bpid block pool id
              * @param replicaInfo a replica's meta information
          -   * @return previous meta information of the replica
          +   * @return true if inserted into the set
              * @throws IllegalArgumentException if the input parameter is null
              */
          -  ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) {
          +  boolean add(String bpid, ReplicaInfo replicaInfo) {
          

          I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for working on this, Staffan Friberg . It looks promising. > @Override > boolean removeStorage(DatanodeStorageInfo storage) { > int dnIndex = findStorageInfoFromEnd(storage); > if (dnIndex < 0) { // the node is not found > return false ; > } > // set the triplet to null > setStorageInfo(dnIndex, null ); > indices[dnIndex] = -1; > return true ; > } This still refers to "the triplet" but there are no more triplets, right? There are some other comments referencing "triplets" in BlockInfoStriped that should be fixed as well. What is the strategy for shrinking BlockInfo#storages ? It seems like right now setStorageInfo(<index>, null) will create a "hole" in the array, but it is never actually shrunk. > // Remove here for now as removeStoredBlock will do it otherwise > // and cause concurrent modification exception This comment could be clearer. How about "we must remove the block via the iterator"? > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // TODO: Pretty confident this should be s/storedBlock/block below, > // since we should be postponing the info of the reported block, not > // the stored block. See HDFS-6289 for more context. > queueReportedBlock(storageInfo, storedBlock, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { If we're really confident that this should be "block" rather than "storedBlock", let's fix it. @@ -122,8 +91,9 @@ BlockInfo addBlockCollection(BlockInfo b, BlockCollection bc) { */ void removeBlock(Block block) { BlockInfo blockInfo = blocks.remove(block); - if (blockInfo == null ) + if (blockInfo == null ) { return ; + } ... @@ -191,8 +177,9 @@ int numNodes(Block b) { */ boolean removeNode(Block b, DatanodeDescriptor node) { BlockInfo info = blocks.get(b); - if (info == null ) + if (info == null ) { return false ; + } Let's try to avoid "no-op" changes like this in this patch, since it's already pretty big. We can fix whitespace and so forth in other JIRAs to avoid creating confusion about what was changed here. return set != null ? set.get(blockId, LONG_AND_BLOCK_COMPARATOR) : null ; This might be simpler as: if (set == null ) { return null ; } return set.get(blockId, LONG_AND_BLOCK_COMPARATOR); /** * Add a replica's meta information into the map * * @param bpid block pool id * @param replicaInfo a replica's meta information - * @ return previous meta information of the replica + * @ return true if inserted into the set * @ throws IllegalArgumentException if the input parameter is null */ - ReplicaInfo add( String bpid, ReplicaInfo replicaInfo) { + boolean add( String bpid, ReplicaInfo replicaInfo) { I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc.
          Hide
          sfriberg Staffan Friberg added a comment -

          Thanks for the comments Colin P. McCabe! Did all your suggested changes, except the two below which needed some further discussion.

           >       if (shouldPostponeBlocksFromFuture) {
           >         // If the block is an out-of-date generation stamp or state,
           >         // but we're the standby, we shouldn't treat it as corrupt,
           >         // but instead just queue it for later processing.
           >         // TODO: Pretty confident this should be s/storedBlock/block below,
           >         // since we should be postponing the info of the reported block, not
           >         // the stored block. See HDFS-6289 for more context.
           >         queueReportedBlock(storageInfo, storedBlock, reportedState,
           >             QUEUE_REASON_CORRUPT_STATE);
           >       } else {
          

          If we're really confident that this should be "block" rather than "storedBlock", let's fix it.

          This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug.

             /**
              * Add a replica's meta information into the map
              *
              * @param bpid block pool id
              * @param replicaInfo a replica's meta information
          -   * @return previous meta information of the replica
          +   * @return true if inserted into the set
              * @throws IllegalArgumentException if the input parameter is null
              */
          -  ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) {
          +  boolean add(String bpid, ReplicaInfo replicaInfo) {
          

          I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc.

          Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet.

          Show
          sfriberg Staffan Friberg added a comment - Thanks for the comments Colin P. McCabe ! Did all your suggested changes, except the two below which needed some further discussion. > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // TODO: Pretty confident this should be s/storedBlock/block below, > // since we should be postponing the info of the reported block, not > // the stored block. See HDFS-6289 for more context. > queueReportedBlock(storageInfo, storedBlock, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { If we're really confident that this should be "block" rather than "storedBlock", let's fix it. This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug. /** * Add a replica's meta information into the map * * @param bpid block pool id * @param replicaInfo a replica's meta information - * @ return previous meta information of the replica + * @ return true if inserted into the set * @ throws IllegalArgumentException if the input parameter is null */ - ReplicaInfo add( String bpid, ReplicaInfo replicaInfo) { + boolean add( String bpid, ReplicaInfo replicaInfo) { I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc. Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet.
          Hide
          sfriberg Staffan Friberg added a comment -

          What is the strategy for shrinking BlockInfo#storages? It seems like right now setStorageInfo(<index>, null) will create a "hole" in the array, but it is never actually shrunk.

          Forgot to comment on this one. I don't believe the storage was ever reduced in size, it could only be increased if the replication was increased. Only the BlockInfoStriped will have (and support having) holes in the array.
          The BlockInfoContiguous will move the last entry to the slot being deleted, and clear the last slot when calling removeStorage, and add only to the end of the array in addStorage.

          Show
          sfriberg Staffan Friberg added a comment - What is the strategy for shrinking BlockInfo#storages? It seems like right now setStorageInfo(<index>, null) will create a "hole" in the array, but it is never actually shrunk. Forgot to comment on this one. I don't believe the storage was ever reduced in size, it could only be increased if the replication was increased. Only the BlockInfoStriped will have (and support having) holes in the array. The BlockInfoContiguous will move the last entry to the slot being deleted, and clear the last slot when calling removeStorage, and add only to the end of the array in addStorage.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 19 new or modified test files.
          +1 mvninstall 8m 1s trunk passed
          +1 compile 0m 44s trunk passed with JDK v1.8.0_66
          +1 compile 0m 46s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 55s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 2m 1s trunk passed
          +1 javadoc 1m 11s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 53s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 42s the patch passed with JDK v1.8.0_66
          +1 cc 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          +1 compile 0m 42s the patch passed with JDK v1.7.0_91
          +1 cc 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 7 new + 844 unchanged - 12 fixed = 851 total (was 856)
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          -1 findbugs 2m 16s hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
          +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 48s the patch passed with JDK v1.7.0_91
          -1 unit 62m 6s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 63m 15s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 23s Patch does not generate ASF License warnings.
          153m 21s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo$1.next() can't throw NoSuchElementException At BlockInfo.java:At BlockInfo.java:[line 117]
            Load of known null value in org.apache.hadoop.hdfs.util.TreeSet.add(Object) At TreeSet.java:in org.apache.hadoop.hdfs.util.TreeSet.add(Object) At TreeSet.java:[line 616]
            Class org.apache.hadoop.hdfs.util.TreeSet$Node implements Cloneable but does not define or use clone method At TreeSet.java:does not define or use clone method At TreeSet.java:[lines 51-223]
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.TestNNThroughputBenchmark



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783609/HDFS-9260.011.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux ab9da37357f9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4992398
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14186/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14186/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 19 new or modified test files. +1 mvninstall 8m 1s trunk passed +1 compile 0m 44s trunk passed with JDK v1.8.0_66 +1 compile 0m 46s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 1m 11s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 53s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 49s the patch passed +1 compile 0m 42s the patch passed with JDK v1.8.0_66 +1 cc 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 compile 0m 42s the patch passed with JDK v1.7.0_91 +1 cc 0m 42s the patch passed +1 javac 0m 42s the patch passed -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 7 new + 844 unchanged - 12 fixed = 851 total (was 856) +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 findbugs 2m 16s hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 48s the patch passed with JDK v1.7.0_91 -1 unit 62m 6s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 63m 15s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 153m 21s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo$1.next() can't throw NoSuchElementException At BlockInfo.java:At BlockInfo.java: [line 117]   Load of known null value in org.apache.hadoop.hdfs.util.TreeSet.add(Object) At TreeSet.java:in org.apache.hadoop.hdfs.util.TreeSet.add(Object) At TreeSet.java: [line 616]   Class org.apache.hadoop.hdfs.util.TreeSet$Node implements Cloneable but does not define or use clone method At TreeSet.java:does not define or use clone method At TreeSet.java: [lines 51-223] JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783609/HDFS-9260.011.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux ab9da37357f9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4992398 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14186/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14186/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14186/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug.

          That's fair.

          Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet.

          That's a good idea. Yes, please, let's have an addWithReplace method. Otherwise, there is a risk that we will accidentally change the semantics of something in an incorrect way (I notice a lot of cases where "add" turns into "replace" in this patch and it makes me nervous).

          I had some minor conflicts applying to trunk, I guess it needs a rebase anyway.

          Thanks, Staffan Friberg. Very exciting to see this make progress.

          Show
          cmccabe Colin P. McCabe added a comment - This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug. That's fair. Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet. That's a good idea. Yes, please, let's have an addWithReplace method. Otherwise, there is a risk that we will accidentally change the semantics of something in an incorrect way (I notice a lot of cases where "add" turns into "replace" in this patch and it makes me nervous). I had some minor conflicts applying to trunk, I guess it needs a rebase anyway. Thanks, Staffan Friberg . Very exciting to see this make progress.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 5s HDFS-9260 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783927/HDFS-9260.012.patch
          JIRA Issue HDFS-9260
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14210/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 5s HDFS-9260 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783927/HDFS-9260.012.patch JIRA Issue HDFS-9260 Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14210/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 19 new or modified test files.
          +1 mvninstall 8m 45s trunk passed
          +1 compile 0m 48s trunk passed with JDK v1.8.0_66
          +1 compile 0m 50s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 1m 4s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 2m 15s trunk passed
          +1 javadoc 1m 16s trunk passed with JDK v1.8.0_66
          +1 javadoc 2m 1s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 55s the patch passed
          +1 compile 0m 45s the patch passed with JDK v1.8.0_66
          +1 cc 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 compile 0m 46s the patch passed with JDK v1.7.0_91
          +1 cc 0m 46s the patch passed
          +1 javac 0m 46s the patch passed
          -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 4 new + 704 unchanged - 12 fixed = 708 total (was 716)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 27s the patch passed
          +1 javadoc 1m 18s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 9s the patch passed with JDK v1.7.0_91
          -1 unit 57m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          +1 unit 53m 15s hadoop-hdfs in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          141m 31s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
            hadoop.hdfs.server.datanode.TestBlockScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783978/HDFS-9260.013.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 7b106530e80c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 99829eb
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14220/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14220/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 19 new or modified test files. +1 mvninstall 8m 45s trunk passed +1 compile 0m 48s trunk passed with JDK v1.8.0_66 +1 compile 0m 50s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 2m 15s trunk passed +1 javadoc 1m 16s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 1s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 55s the patch passed +1 compile 0m 45s the patch passed with JDK v1.8.0_66 +1 cc 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 compile 0m 46s the patch passed with JDK v1.7.0_91 +1 cc 0m 46s the patch passed +1 javac 0m 46s the patch passed -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 4 new + 704 unchanged - 12 fixed = 708 total (was 716) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 27s the patch passed +1 javadoc 1m 18s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 9s the patch passed with JDK v1.7.0_91 -1 unit 57m 25s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 53m 15s hadoop-hdfs in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 141m 31s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.server.datanode.TestBlockScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783978/HDFS-9260.013.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 7b106530e80c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 99829eb Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14220/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14220/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14220/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Fixed checkstyle on TreeSet.

          Should I convert storages field to private? (The triplets field was protected)

          Show
          sfriberg Staffan Friberg added a comment - Fixed checkstyle on TreeSet. Should I convert storages field to private? (The triplets field was protected)
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 19 new or modified test files.
          +1 mvninstall 7m 43s trunk passed
          +1 compile 0m 40s trunk passed with JDK v1.8.0_66
          +1 compile 0m 43s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 47s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_66
          +1 cc 0m 38s the patch passed
          +1 javac 0m 38s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_91
          +1 cc 0m 39s the patch passed
          +1 javac 0m 39s the patch passed
          -1 checkstyle 0m 24s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 704 unchanged - 12 fixed = 706 total (was 716)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 4s the patch passed
          +1 javadoc 1m 4s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91
          -1 unit 51m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 50m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 22s Patch does not generate ASF License warnings.
          128m 41s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784167/HDFS-9260.014.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux eab8b9080f93 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6eacdea
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14233/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14233/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 19 new or modified test files. +1 mvninstall 7m 43s trunk passed +1 compile 0m 40s trunk passed with JDK v1.8.0_66 +1 compile 0m 43s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 47s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_66 +1 cc 0m 38s the patch passed +1 javac 0m 38s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_91 +1 cc 0m 39s the patch passed +1 javac 0m 39s the patch passed -1 checkstyle 0m 24s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 704 unchanged - 12 fixed = 706 total (was 716) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 4s the patch passed +1 javadoc 1m 4s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91 -1 unit 51m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 50m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 22s Patch does not generate ASF License warnings. 128m 41s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDataNodeMetrics JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784167/HDFS-9260.014.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux eab8b9080f93 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6eacdea Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14233/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14233/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14233/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Should I convert storages field to private? (The triplets field was protected)

          That sounds like a good idea. We have functions to manipulate these fields, so the subclasses don't need to directly poke at the fields.

          If that involves code changes to the subclasses, though, let's just do it in a follow-on JIRA. This is one case where checkstyle is not that useful, since it's warning about a problem that already exists before this patch (and isn't even really a "problem," just an infelicity).

          -  public static final String  DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY = "dfs.namenode.replication.max-streams-hard-limit";
          +  public static final String  DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY =
          +      "dfs.namenode.replication.max-streams-hard-limit";
          

          Can we skip this change? It's not really part of this work and it makes the diff bigger.

          +  public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_KEY
          +      = "dfs.namenode.storageinfo.efficiency.interval";
          +  public static final int DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_DEFAULT
          +      = 600;
          

          Should be something like dfs.namenode.storageinfo.defragmenter.interval.ms to indicate that it's a scan interval, and that it's in milliseconds.

          +    Object[] old = storages;
          +    storages = new DatanodeStorageInfo[(last+num)];
          +    System.arraycopy(old, 0, storages, 0, last);
          

          Now "old" can have type DatanodeStorageInfo[] rather than Object[], right?

          +      storageInfoMonitorThread.interrupt();
          

          Maybe this should be something like storageInfoDefragmenterThread? "monitor" suggests something like the block scanner, not defragmentation (at least in my mind?)

          import org.apache.hadoop.hdfs.util.TreeSet;
          

          I think this might be less confusing if you called it ChunkedTreeSet. If I were just a new developer looking at the code (or even an experienced developer), I wouldn't really expect us to be using something called TreeSet which was actually completely different than java.util.TreeSet.

          --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          @@ -246,6 +246,7 @@ message BlockReportRequestProto {
             required string blockPoolId = 2;
             repeated StorageBlockReportProto reports = 3;
             optional BlockReportContextProto context = 4;
          +  optional bool sorted = 5 [default = false];
           }
          

          The other fields in BlockReportRequestProto have comments explaining what they are. Let's add one for "sorted"

          Show
          cmccabe Colin P. McCabe added a comment - - edited Should I convert storages field to private? (The triplets field was protected) That sounds like a good idea. We have functions to manipulate these fields, so the subclasses don't need to directly poke at the fields. If that involves code changes to the subclasses, though, let's just do it in a follow-on JIRA. This is one case where checkstyle is not that useful, since it's warning about a problem that already exists before this patch (and isn't even really a "problem," just an infelicity). - public static final String DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY = "dfs.namenode.replication.max-streams-hard-limit" ; + public static final String DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY = + "dfs.namenode.replication.max-streams-hard-limit" ; Can we skip this change? It's not really part of this work and it makes the diff bigger. + public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_KEY + = "dfs.namenode.storageinfo.efficiency.interval" ; + public static final int DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_DEFAULT + = 600; Should be something like dfs.namenode.storageinfo.defragmenter.interval.ms to indicate that it's a scan interval, and that it's in milliseconds. + Object [] old = storages; + storages = new DatanodeStorageInfo[(last+num)]; + System .arraycopy(old, 0, storages, 0, last); Now "old" can have type DatanodeStorageInfo[] rather than Object[] , right? + storageInfoMonitorThread.interrupt(); Maybe this should be something like storageInfoDefragmenterThread ? "monitor" suggests something like the block scanner, not defragmentation (at least in my mind?) import org.apache.hadoop.hdfs.util.TreeSet; I think this might be less confusing if you called it ChunkedTreeSet . If I were just a new developer looking at the code (or even an experienced developer), I wouldn't really expect us to be using something called TreeSet which was actually completely different than java.util.TreeSet . --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto @@ -246,6 +246,7 @@ message BlockReportRequestProto { required string blockPoolId = 2; repeated StorageBlockReportProto reports = 3; optional BlockReportContextProto context = 4; + optional bool sorted = 5 [ default = false ]; } The other fields in BlockReportRequestProto have comments explaining what they are. Let's add one for "sorted"
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for the great work, Staffan Friberg. The patch looks good overall. Some comments and questions:

          1. For DatanodeProtocol#blockReport, can we move "boolean sorted" inside of BlockReportContext and make its default value false? In this way we can keep the backward compatibility between DN and NN (though allow old DN sending reports to NN can lead to bad performance).
          2. Need to update the javadoc of ReplicaMap#map.
          3. DatanodeStorageInfo#addBlockInitial(BlockInfo) has not been called.
          4. In BlockManager#removeBlocksAssociatedTo and removeZombieReplicas, looks like we're removing the block from the DatanodeStorageInfo twice? Can we modify removeStoredBlock to avoid the second remove op?
                for (DatanodeStorageInfo storage : node.getStorageInfos()) {
                  final Iterator<BlockInfo> it = storage.getBlockIterator();
                  while (it.hasNext()) {
                    BlockInfo block = it.next();
                    // DatanodeStorageInfo must be removed using the iterator to avoid
                    // ConcurrentModificationException in the underlying storage
                    it.remove();
                    removeStoredBlock(block, node);
                  }
                }
            
          5. In reportDiffSorted, since we already convert the replica id in the beginning, when we call getStoredBlock we can pass in the replicaID. Or we can directly call getStoredBlock in the beginning.
                  if (BlockIdManager.isStripedBlockID(replicaID)
                      && (!hasNonEcBlockUsingStripedID ||
                          !blocksMap.containsBlock(replica))) {
                    replicaID = BlockIdManager.convertToStripedID(replicaID);
                  }
            
          6. The following TODO has been deleted from the current trunk.
                if (invalidateBlocks.contains(dn, replica)) {
                  /*
                   * TODO: following assertion is incorrect, see HDFS-2668 assert
                   * storedBlock.findDatanode(dn) < 0 : "Block " + block +
                   * " in recentInvalidatesSet should not appear in DN " + dn;
                   */
                  return;
                }
            
          7. It's better to avoid duplicated code between processAndHandleReportedBlock and reportDiffSortedInner. The logic for processing reported block is complicated. Having copies in two different places causes extra maintenance burden.
          8. In TreeSet#removeElementAt, currently the compaction is only done to merge node into next/prev. Do you also want to check if we can merge next/prev into the current node if next/perv's size is 1? (Looks like Apache Harmony's implementation has this extra check)
          9. Maybe we should have an upper bound for the total number of storage compaction done in each iteration? And the next iteration can continue from the stopping point of the previous iteration. Considering calcualting the fill ratio also needs to go through the whole tree, each compaction iteration will go through at least (total_number_blocks * replication_factor / 64) tree nodes. This may be a big workload and in the best scenario (i.e., no compaction is necessary) the read lock will still be held for a while.
          10. It will also be helpful if you can post performance numbers about the compaction.
          Show
          jingzhao Jing Zhao added a comment - Thanks for the great work, Staffan Friberg . The patch looks good overall. Some comments and questions: For DatanodeProtocol#blockReport , can we move "boolean sorted" inside of BlockReportContext and make its default value false? In this way we can keep the backward compatibility between DN and NN (though allow old DN sending reports to NN can lead to bad performance). Need to update the javadoc of ReplicaMap#map. DatanodeStorageInfo#addBlockInitial(BlockInfo) has not been called. In BlockManager#removeBlocksAssociatedTo and removeZombieReplicas , looks like we're removing the block from the DatanodeStorageInfo twice? Can we modify removeStoredBlock to avoid the second remove op? for (DatanodeStorageInfo storage : node.getStorageInfos()) { final Iterator<BlockInfo> it = storage.getBlockIterator(); while (it.hasNext()) { BlockInfo block = it.next(); // DatanodeStorageInfo must be removed using the iterator to avoid // ConcurrentModificationException in the underlying storage it.remove(); removeStoredBlock(block, node); } } In reportDiffSorted , since we already convert the replica id in the beginning, when we call getStoredBlock we can pass in the replicaID. Or we can directly call getStoredBlock in the beginning. if (BlockIdManager.isStripedBlockID(replicaID) && (!hasNonEcBlockUsingStripedID || !blocksMap.containsBlock(replica))) { replicaID = BlockIdManager.convertToStripedID(replicaID); } The following TODO has been deleted from the current trunk. if (invalidateBlocks.contains(dn, replica)) { /* * TODO: following assertion is incorrect, see HDFS-2668 assert * storedBlock.findDatanode(dn) < 0 : "Block " + block + * " in recentInvalidatesSet should not appear in DN " + dn; */ return ; } It's better to avoid duplicated code between processAndHandleReportedBlock and reportDiffSortedInner . The logic for processing reported block is complicated. Having copies in two different places causes extra maintenance burden. In TreeSet#removeElementAt, currently the compaction is only done to merge node into next/prev. Do you also want to check if we can merge next/prev into the current node if next/perv's size is 1? (Looks like Apache Harmony's implementation has this extra check) Maybe we should have an upper bound for the total number of storage compaction done in each iteration? And the next iteration can continue from the stopping point of the previous iteration. Considering calcualting the fill ratio also needs to go through the whole tree, each compaction iteration will go through at least (total_number_blocks * replication_factor / 64) tree nodes. This may be a big workload and in the best scenario (i.e., no compaction is necessary) the read lock will still be held for a while. It will also be helpful if you can post performance numbers about the compaction.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 19 new or modified test files.
          +1 mvninstall 9m 44s trunk passed
          +1 compile 1m 22s trunk passed with JDK v1.8.0_66
          +1 compile 1m 6s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 1m 12s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 2m 32s trunk passed
          +1 javadoc 1m 44s trunk passed with JDK v1.8.0_66
          +1 javadoc 2m 31s trunk passed with JDK v1.7.0_91
          +1 mvninstall 1m 6s the patch passed
          +1 compile 1m 19s the patch passed with JDK v1.8.0_66
          +1 cc 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          +1 compile 0m 57s the patch passed with JDK v1.7.0_91
          +1 cc 0m 57s the patch passed
          +1 javac 0m 57s the patch passed
          -1 checkstyle 0m 33s hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 705 unchanged - 11 fixed = 711 total (was 716)
          +1 mvnsite 1m 11s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 47s the patch passed
          +1 javadoc 1m 40s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 30s the patch passed with JDK v1.7.0_91
          -1 unit 97m 7s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 85m 48s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          220m 12s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.namenode.TestCacheDirectives
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.7.0_91 Failed junit tests hadoop.tracing.TestTracing
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.TestSafeMode
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.datanode.TestDirectoryScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784739/HDFS-9260.015.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 732e22df0341 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fb238d7
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14270/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14270/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 19 new or modified test files. +1 mvninstall 9m 44s trunk passed +1 compile 1m 22s trunk passed with JDK v1.8.0_66 +1 compile 1m 6s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 36s trunk passed +1 mvnsite 1m 12s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 2m 32s trunk passed +1 javadoc 1m 44s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 31s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 6s the patch passed +1 compile 1m 19s the patch passed with JDK v1.8.0_66 +1 cc 1m 19s the patch passed +1 javac 1m 19s the patch passed +1 compile 0m 57s the patch passed with JDK v1.7.0_91 +1 cc 0m 57s the patch passed +1 javac 0m 57s the patch passed -1 checkstyle 0m 33s hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 705 unchanged - 11 fixed = 711 total (was 716) +1 mvnsite 1m 11s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 47s the patch passed +1 javadoc 1m 40s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 30s the patch passed with JDK v1.7.0_91 -1 unit 97m 7s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 85m 48s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 220m 12s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.TestCacheDirectives   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_91 Failed junit tests hadoop.tracing.TestTracing   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.TestSafeMode   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784739/HDFS-9260.015.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 732e22df0341 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fb238d7 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14270/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14270/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14270/console This message was automatically generated.
          Hide
          sfriberg Staffan Friberg added a comment -

          Thanks for the comments

          Patch 15 (and 16) should address all your comments.

          I did not change the protected to private as there are some direct access in the two subclasses.

          Show
          sfriberg Staffan Friberg added a comment - Thanks for the comments Patch 15 (and 16) should address all your comments. I did not change the protected to private as there are some direct access in the two subclasses.
          Hide
          sfriberg Staffan Friberg added a comment -

          Hi Jing Zhao,

          Thank you for your comments! Updated with the patch (version 16).

          1. Done, moved to context
          2. Done
          3. Done, removed
          4. I have started to look at this multiple times as I have been working on the patch, but have so far failed to find a simple way to separate it. The remove methods are so deeply linked when removing a block that I can't really figure out a clean way to lift it out, and if it was possible it would in itself be a fairly large change I believe. Let me know if you have any ideas.
          5. Done, locking up directly in the map with a new Block(replicaID).
          6. Done, removed
          7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it.
          8. Done, I do the same checks I do in removeLeft/Right
          9, 10. Good point. Is it required to hold the readlock around the loops, or would it be enough to just hold it around the inner most iteration that calculates the fragmentation for a storage. Would help reduce time significantly for the first iteration. Need to think a bit for about the second part when actually doing defragmentation on abort mechanism. What is an OK time limit? I saw 4ms being mentioned in HDFS-9198.

          Show
          sfriberg Staffan Friberg added a comment - Hi Jing Zhao , Thank you for your comments! Updated with the patch (version 16). 1. Done, moved to context 2. Done 3. Done, removed 4. I have started to look at this multiple times as I have been working on the patch, but have so far failed to find a simple way to separate it. The remove methods are so deeply linked when removing a block that I can't really figure out a clean way to lift it out, and if it was possible it would in itself be a fairly large change I believe. Let me know if you have any ideas. 5. Done, locking up directly in the map with a new Block(replicaID). 6. Done, removed 7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it. 8. Done, I do the same checks I do in removeLeft/Right 9, 10. Good point. Is it required to hold the readlock around the loops, or would it be enough to just hold it around the inner most iteration that calculates the fragmentation for a storage. Would help reduce time significantly for the first iteration. Need to think a bit for about the second part when actually doing defragmentation on abort mechanism. What is an OK time limit? I saw 4ms being mentioned in HDFS-9198 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 15 new or modified test files.
          +1 mvninstall 6m 49s trunk passed
          +1 compile 0m 41s trunk passed with JDK v1.8.0_66
          +1 compile 0m 43s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 0m 55s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 54s trunk passed
          +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 41s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_66
          +1 cc 0m 38s the patch passed
          +1 javac 0m 38s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_91
          +1 cc 0m 39s the patch passed
          +1 javac 0m 39s the patch passed
          -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 5 new + 687 unchanged - 11 fixed = 692 total (was 698)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 5s the patch passed
          +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 43s the patch passed with JDK v1.7.0_91
          -1 unit 65m 3s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 63m 43s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          154m 30s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode
            hadoop.hdfs.server.datanode.TestBlockScanner
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.TestDFSUpgradeFromImage



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785031/HDFS-9260.016.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 9ffa37ca07ee 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 41da9a0
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14282/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14282/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 15 new or modified test files. +1 mvninstall 6m 49s trunk passed +1 compile 0m 41s trunk passed with JDK v1.8.0_66 +1 compile 0m 43s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 54s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 41s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_66 +1 cc 0m 38s the patch passed +1 javac 0m 38s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_91 +1 cc 0m 39s the patch passed +1 javac 0m 39s the patch passed -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 5 new + 687 unchanged - 11 fixed = 692 total (was 698) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 5s the patch passed +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 43s the patch passed with JDK v1.7.0_91 -1 unit 65m 3s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 63m 43s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 154m 30s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.TestDFSUpgradeFromImage Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785031/HDFS-9260.016.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 9ffa37ca07ee 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 41da9a0 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14282/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14282/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, Staffan Friberg.

          7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it.

          Yeah, this seems like something that should be done in a follow-on JIRA.

          +  public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_MS_KEY
          +      = "dfs.namenode.storageinfo.efficiency.interval.ms";
          

          I appreciate that this now has time units associated with it, but I feel that "defragmenter" should be somewhere in the name. This is basically a configuration key for the defragmenter.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, Staffan Friberg . 7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it. Yeah, this seems like something that should be done in a follow-on JIRA. + public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_MS_KEY + = "dfs.namenode.storageinfo.efficiency.interval.ms" ; I appreciate that this now has time units associated with it, but I feel that "defragmenter" should be somewhere in the name. This is basically a configuration key for the defragmenter.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for updating the patch, Staffan Friberg.

          Yeah, I think we can do #4 and #7 as follow-on.

          For compaction, do you think we can track the total number of nodes in a treeset? We already know the total number of entries (through size), if we can also track the total number of nodes and keep it updated along with the create/delete Node operations, the fill ratio calculation is O(1) per tree and the read lock holding time can be greatly decreased. Also holding the read lock around the inner most iteration should also work I think.

          For the second part when doing real compaction, currently I do not have exact time limit number. Maybe we can first use 4ms as mentioned by HDFS-9198. Will be helpful if Daryn Sharp can comment here.

          Show
          jingzhao Jing Zhao added a comment - Thanks for updating the patch, Staffan Friberg . Yeah, I think we can do #4 and #7 as follow-on. For compaction, do you think we can track the total number of nodes in a treeset? We already know the total number of entries (through size), if we can also track the total number of nodes and keep it updated along with the create/delete Node operations, the fill ratio calculation is O(1) per tree and the read lock holding time can be greatly decreased. Also holding the read lock around the inner most iteration should also work I think. For the second part when doing real compaction, currently I do not have exact time limit number. Maybe we can first use 4ms as mentioned by HDFS-9198 . Will be helpful if Daryn Sharp can comment here.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hmm. Let's make the compaction time a tunable, and then it can be adjusted if 4ms is too long. I think it's a pretty reasonable default, given that we've used it as such in the past.

          Agree that we could make the compaction more efficient and low-latency. Let's do that in a follow-on JIRA so that we can get this patch into trunk and get some validation of the idea.

          Show
          cmccabe Colin P. McCabe added a comment - Hmm. Let's make the compaction time a tunable, and then it can be adjusted if 4ms is too long. I think it's a pretty reasonable default, given that we've used it as such in the past. Agree that we could make the compaction more efficient and low-latency. Let's do that in a follow-on JIRA so that we can get this patch into trunk and get some validation of the idea.
          Hide
          sfriberg Staffan Friberg added a comment -

          Patch17
          Added timeout as a tuneable
          Smaller read-lock region and optimized the fill ratio calculation so not all nodes are required to be iterated (still need to find the last node).
          Updated tuneable names as per Colin's suggestion

          Show
          sfriberg Staffan Friberg added a comment - Patch17 Added timeout as a tuneable Smaller read-lock region and optimized the fill ratio calculation so not all nodes are required to be iterated (still need to find the last node). Updated tuneable names as per Colin's suggestion
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, Staffan Friberg.

          4194	              if (!storage.treeSetCompact(storageInfoDefragmentTimeout)) {
          4195	                // Compaction timed out, reset iterator to continue with
          4196	                // the same storage next iteration.
          4197	                i += 2;
          4198	              }
          4199	              LOG.info("StorageInfo TreeSet defragmented {} : {}",
          4200	                       storage.getStorageID(), storage.treeSetFillRatio());
          4201	            }
          

          Hmm. The comment says "reset iterator." Is this intended to be subtracing 2 from i, rather than adding? Also, shouldn't we log a message when compaction times out? Right now we log the success message whether or not it actually succeeded, right?

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, Staffan Friberg . 4194 if (!storage.treeSetCompact(storageInfoDefragmentTimeout)) { 4195 // Compaction timed out, reset iterator to continue with 4196 // the same storage next iteration. 4197 i += 2; 4198 } 4199 LOG.info( "StorageInfo TreeSet defragmented {} : {}" , 4200 storage.getStorageID(), storage.treeSetFillRatio()); 4201 } Hmm. The comment says "reset iterator." Is this intended to be subtracing 2 from i, rather than adding? Also, shouldn't we log a message when compaction times out? Right now we log the success message whether or not it actually succeeded, right?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 15 new or modified test files.
          +1 mvninstall 6m 52s trunk passed
          +1 compile 0m 45s trunk passed with JDK v1.8.0_66
          +1 compile 0m 42s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 58s trunk passed
          +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 50s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_66
          +1 cc 0m 38s the patch passed
          +1 javac 0m 38s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_91
          +1 cc 0m 39s the patch passed
          +1 javac 0m 39s the patch passed
          -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 686 unchanged - 11 fixed = 689 total (was 697)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 5s the patch passed
          +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91
          -1 unit 54m 40s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 51m 38s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          132m 17s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestEncryptedTransfer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785269/HDFS-9260.017.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 96e52adeec51 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 772ea7b
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14294/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14294/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 15 new or modified test files. +1 mvninstall 6m 52s trunk passed +1 compile 0m 45s trunk passed with JDK v1.8.0_66 +1 compile 0m 42s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 58s trunk passed +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 50s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 47s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_66 +1 cc 0m 38s the patch passed +1 javac 0m 38s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_91 +1 cc 0m 39s the patch passed +1 javac 0m 39s the patch passed -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 686 unchanged - 11 fixed = 689 total (was 697) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 5s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91 -1 unit 54m 40s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 51m 38s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 132m 17s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestEncryptedTransfer Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785269/HDFS-9260.017.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 96e52adeec51 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 772ea7b Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14294/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14294/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 15 new or modified test files.
          0 mvndep 0m 12s Maven dependency ordering for branch
          +1 mvninstall 12m 21s trunk passed
          +1 compile 1m 51s trunk passed with JDK v1.8.0_66
          +1 compile 1m 13s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 45s trunk passed
          +1 mvnsite 1m 35s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 3m 18s trunk passed
          +1 javadoc 2m 18s trunk passed with JDK v1.8.0_66
          +1 javadoc 3m 16s trunk passed with JDK v1.7.0_91
          0 mvndep 0m 13s Maven dependency ordering for patch
          +1 mvninstall 1m 30s the patch passed
          +1 compile 1m 51s the patch passed with JDK v1.8.0_66
          +1 cc 1m 51s the patch passed
          +1 javac 1m 51s the patch passed
          +1 compile 1m 21s the patch passed with JDK v1.7.0_91
          +1 cc 1m 21s the patch passed
          +1 javac 1m 21s the patch passed
          -1 checkstyle 0m 48s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 684 unchanged - 10 fixed = 687 total (was 694)
          +1 mvnsite 1m 38s the patch passed
          +1 mvneclipse 0m 20s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 3m 44s the patch passed
          +1 javadoc 2m 28s the patch passed with JDK v1.8.0_66
          +1 javadoc 3m 25s the patch passed with JDK v1.7.0_91
          -1 unit 141m 52s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 127m 21s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 36s Patch does not generate ASF License warnings.
          319m 11s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.TestRecoverStripedFile
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.fs.TestSymlinkHdfsFileContext
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.TestFileTruncate
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.TestSecureNameNode
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.TestDataTransferKeepalive
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.fs.TestSymlinkHdfsFileContext
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.TestFileTruncate
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
            hadoop.hdfs.server.datanode.TestDirectoryScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785530/HDFS-9260.018.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux e47f9e4e9ee5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8f2622b
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14321/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14321/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 15 new or modified test files. 0 mvndep 0m 12s Maven dependency ordering for branch +1 mvninstall 12m 21s trunk passed +1 compile 1m 51s trunk passed with JDK v1.8.0_66 +1 compile 1m 13s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 45s trunk passed +1 mvnsite 1m 35s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 3m 18s trunk passed +1 javadoc 2m 18s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 16s trunk passed with JDK v1.7.0_91 0 mvndep 0m 13s Maven dependency ordering for patch +1 mvninstall 1m 30s the patch passed +1 compile 1m 51s the patch passed with JDK v1.8.0_66 +1 cc 1m 51s the patch passed +1 javac 1m 51s the patch passed +1 compile 1m 21s the patch passed with JDK v1.7.0_91 +1 cc 1m 21s the patch passed +1 javac 1m 21s the patch passed -1 checkstyle 0m 48s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 684 unchanged - 10 fixed = 687 total (was 694) +1 mvnsite 1m 38s the patch passed +1 mvneclipse 0m 20s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 3m 44s the patch passed +1 javadoc 2m 28s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 25s the patch passed with JDK v1.7.0_91 -1 unit 141m 52s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 127m 21s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 36s Patch does not generate ASF License warnings. 319m 11s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.fs.TestSymlinkHdfsFileContext   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestFileTruncate   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.TestSecureNameNode   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.TestDataTransferKeepalive   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.fs.TestSymlinkHdfsFileContext   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestFileTruncate   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785530/HDFS-9260.018.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux e47f9e4e9ee5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8f2622b Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14321/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14321/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14321/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, Staffan Friberg. The unit test failures appear to be flaky tests; they succeed for me locally (except TestDirectoryScanner, which fails for me both with and without the patch applied).

          +1. Will commit in a few hours if there are no further comments. We can do any additional refactoring or optimization in follow-ups.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, Staffan Friberg . The unit test failures appear to be flaky tests; they succeed for me locally (except TestDirectoryScanner , which fails for me both with and without the patch applied). +1. Will commit in a few hours if there are no further comments. We can do any additional refactoring or optimization in follow-ups.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 15 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 7m 12s trunk passed
          +1 compile 0m 42s trunk passed with JDK v1.8.0_66
          +1 compile 0m 44s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 2m 7s trunk passed
          +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 53s trunk passed with JDK v1.7.0_91
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 41s the patch passed with JDK v1.8.0_66
          +1 cc 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          +1 compile 0m 41s the patch passed with JDK v1.7.0_91
          +1 cc 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          -1 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 685 unchanged - 10 fixed = 688 total (was 695)
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 20s the patch passed
          +1 javadoc 1m 5s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 52s the patch passed with JDK v1.7.0_91
          -1 unit 62m 20s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 61m 27s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 25s Patch does not generate ASF License warnings.
          151m 34s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestAclsEndToEnd
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.TestFileAppend
          JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785530/HDFS-9260.018.patch
          JIRA Issue HDFS-9260
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux cc9f5f9bfe15 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ed55950
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14327/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14327/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 15 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 7m 12s trunk passed +1 compile 0m 42s trunk passed with JDK v1.8.0_66 +1 compile 0m 44s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 7s trunk passed +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 53s trunk passed with JDK v1.7.0_91 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 0m 50s the patch passed +1 compile 0m 41s the patch passed with JDK v1.8.0_66 +1 cc 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 compile 0m 41s the patch passed with JDK v1.7.0_91 +1 cc 0m 41s the patch passed +1 javac 0m 41s the patch passed -1 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 685 unchanged - 10 fixed = 688 total (was 695) +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 20s the patch passed +1 javadoc 1m 5s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 52s the patch passed with JDK v1.7.0_91 -1 unit 62m 20s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 61m 27s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 25s Patch does not generate ASF License warnings. 151m 34s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.TestFileAppend JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785530/HDFS-9260.018.patch JIRA Issue HDFS-9260 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux cc9f5f9bfe15 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ed55950 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14327/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14327/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14327/console This message was automatically generated.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9227 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9227/)
          HDFS-9260. Improve the performance and GC friendliness of NameNode (cmccabe: rev dd9ebf6eedfd4ff8b3486eae2a446de6b0c7fa8a)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddStripedBlocks.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoStriped.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoContiguous.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/FoldedTreeSet.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/FoldedTreeSetTest.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9227 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9227/ ) HDFS-9260 . Improve the performance and GC friendliness of NameNode (cmccabe: rev dd9ebf6eedfd4ff8b3486eae2a446de6b0c7fa8a) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddStripedBlocks.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoStriped.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestTriggerBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoContiguous.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/FoldedTreeSet.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/FoldedTreeSetTest.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          Hide
          cmccabe Colin P. McCabe added a comment -

          Committed to trunk. Thanks, Staffan Friberg and Jing Zhao.

          Show
          cmccabe Colin P. McCabe added a comment - Committed to trunk. Thanks, Staffan Friberg and Jing Zhao .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9230 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9230/)
          CHANGES.txt: Move HDFS-9260 to trunk (cmccabe: rev 913676dc355f17dc41b75be1b3a27114197ea52c)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9230 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9230/ ) CHANGES.txt: Move HDFS-9260 to trunk (cmccabe: rev 913676dc355f17dc41b75be1b3a27114197ea52c) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          vinayrpet Vinayakumar B added a comment -

          How about bringing this into branch-2?

          Show
          vinayrpet Vinayakumar B added a comment - How about bringing this into branch-2?
          Hide
          cmccabe Colin P. McCabe added a comment -

          Sure, would you like to backport it to 2.9?

          Show
          cmccabe Colin P. McCabe added a comment - Sure, would you like to backport it to 2.9?
          Hide
          daryn Daryn Sharp added a comment -

          I'd like to propose this patch be reverted on the NN, but perhaps not the DN. Partially because it complicates porting HDFS-11310, but also due to performance concerns.

          Synopsis of the design change: Reference-dense datastructures with high mutation rates are the enemy of young gen gc pauses. When a tenured reference is updated, a dirty card tracks the memory region of the changed reference. The next young gen gc checks traceability to references in the dirty card regions. FBR processing rewrites triplets references to identity unreported blocks, causing a spike in young gen gc load. A folded/sorted tree is a clever means to avoid rewriting the triplets pointer during a FBR, thus reducing young gen gc pressure.

          That said, here's the main issues I see:

          1. FBRs: DNs must send sorted block reports, else the NN creates a temporary sorted/folded tree for iteration. A new block report encoding was designed (by me) for in-place iteration to reduce high object allocation rates that made FBR processing unacceptable at scale. Building the sorted tree undoes the benefit by exploding the entire report into a highly fragmented tree (out of order insertion). During a rolling upgrade this will place extreme pressure on the NN until all DNs are upgraded.
          2. IBRs: The performance of the tree is predicated on increasing block ids to avoid fragmentation while filling the tree. However, block churn and organic replication from dead nodes, decommissioning nodes, failed storages, block balancing, etc will pepper a node with random blocks. The IBRs will cause trees to quickly fragment. The tree mutations are likely to cause more dirty cards than simply linking/unlinking a block into the triplets.
          3. Tree compaction: Every 10 mins all sufficiently fragmented storage trees will be compacted. In practice this may be a large portion of the cluster storages due to IBRs, translating to bursts of very heavy gc load. Heap growth will increase due to defunct tree nodes.
          4. The CMS remark time reduction is not compelling when cycles should occur every few days or a week if the heap is adequately sized.

          The doc primarily focuses on FBRs with a footnote of 4X increase in IBR processing and negative impacts to balancing. Impacts to balancing are equivalent to replication from dead nodes, failed storages, decommissioning nodes, invalidations when dead nodes rejoin, etc. FBR savings are great, but not at the expense of increased processing and gc load from IBRs.

          Unless there are real-world metrics with a large cluster under load that dispute my concerns, I think we should revert.

          Show
          daryn Daryn Sharp added a comment - I'd like to propose this patch be reverted on the NN, but perhaps not the DN. Partially because it complicates porting HDFS-11310 , but also due to performance concerns. Synopsis of the design change: Reference-dense datastructures with high mutation rates are the enemy of young gen gc pauses. When a tenured reference is updated, a dirty card tracks the memory region of the changed reference. The next young gen gc checks traceability to references in the dirty card regions. FBR processing rewrites triplets references to identity unreported blocks, causing a spike in young gen gc load. A folded/sorted tree is a clever means to avoid rewriting the triplets pointer during a FBR, thus reducing young gen gc pressure. That said, here's the main issues I see: FBRs: DNs must send sorted block reports, else the NN creates a temporary sorted/folded tree for iteration. A new block report encoding was designed (by me) for in-place iteration to reduce high object allocation rates that made FBR processing unacceptable at scale. Building the sorted tree undoes the benefit by exploding the entire report into a highly fragmented tree (out of order insertion). During a rolling upgrade this will place extreme pressure on the NN until all DNs are upgraded. IBRs: The performance of the tree is predicated on increasing block ids to avoid fragmentation while filling the tree. However, block churn and organic replication from dead nodes, decommissioning nodes, failed storages, block balancing, etc will pepper a node with random blocks. The IBRs will cause trees to quickly fragment. The tree mutations are likely to cause more dirty cards than simply linking/unlinking a block into the triplets. Tree compaction: Every 10 mins all sufficiently fragmented storage trees will be compacted. In practice this may be a large portion of the cluster storages due to IBRs, translating to bursts of very heavy gc load. Heap growth will increase due to defunct tree nodes. The CMS remark time reduction is not compelling when cycles should occur every few days or a week if the heap is adequately sized. The doc primarily focuses on FBRs with a footnote of 4X increase in IBR processing and negative impacts to balancing. Impacts to balancing are equivalent to replication from dead nodes, failed storages, decommissioning nodes, invalidations when dead nodes rejoin, etc. FBR savings are great, but not at the expense of increased processing and gc load from IBRs. Unless there are real-world metrics with a large cluster under load that dispute my concerns, I think we should revert.
          Hide
          andrew.wang Andrew Wang added a comment -

          FWIW this has been in a released version of CDH for 10 months, and is running in production at some of our highest scale customers. We did see one issue where we mistakenly weren't sending sorted FBRs even after an upgrade, but nothing since this was fixed.

          Daryn, what metrics are you looking for specifically? I can ask our field about collecting these on some big clusters.

          Show
          andrew.wang Andrew Wang added a comment - FWIW this has been in a released version of CDH for 10 months, and is running in production at some of our highest scale customers. We did see one issue where we mistakenly weren't sending sorted FBRs even after an upgrade, but nothing since this was fixed. Daryn, what metrics are you looking for specifically? I can ask our field about collecting these on some big clusters.
          Hide
          daryn Daryn Sharp added a comment -

          I have no doubt this patch "works". If you are collecting GC metrics check if you see elevation or spikes in frequency or cpu over historic levels. Roughly what are the specs for the "highest scale" you have observed?

          This blocks HDFS-7967 which is a critical feature for our clusters - It's been a deployment blocker since 2.6. The balancer is unusable with 500 million to over 1 billion replicas on dense storages. I have detailed in a comment on HDFS-7967 that getBlocks will take hundreds of ms. Sustaining 20-40k average ops/sec is impossible.

          Rolling upgrades also already place very high GC pressure on the NN, some narrowly escaping a full GC. I'm very worried if the NN has to re-sort FBRs during the upgrade.

          I'd like to stop internally maintaining HDFS-7967 and let the community benefit. May we please revert?

          Show
          daryn Daryn Sharp added a comment - I have no doubt this patch "works". If you are collecting GC metrics check if you see elevation or spikes in frequency or cpu over historic levels. Roughly what are the specs for the "highest scale" you have observed? This blocks HDFS-7967 which is a critical feature for our clusters - It's been a deployment blocker since 2.6. The balancer is unusable with 500 million to over 1 billion replicas on dense storages. I have detailed in a comment on HDFS-7967 that getBlocks will take hundreds of ms. Sustaining 20-40k average ops/sec is impossible. Rolling upgrades also already place very high GC pressure on the NN, some narrowly escaping a full GC. I'm very worried if the NN has to re-sort FBRs during the upgrade. I'd like to stop internally maintaining HDFS-7967 and let the community benefit. May we please revert?
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Daryn,

          I talked to our people who help run this large customer cluster. It's at about 350 million blocks, so a pretty good size, but also a lot denser than the last published stats I saw about the 4500-node Yahoo cluster. We don't have historical GC metrics going back a year when we put this into CDH, but they haven't seen anything abnormal in terms of GC.

          They were quite interested your balancer settings though, since we haven't seen it stressing the NN. Could you provide the following?

          dfs.datanode.balance.bandwidthPerSec
          dfs.datanode.balance.max.concurrent.moves
          dfs.namenode.replication.work.multiplier.per.iteration
          dfs.namenode.replication.max-streams-hard-limit
          

          I believe we're running it with mostly default settings like this:

          hdfs balancer -Ddfs.datanode.balance.max.concurrent.moves=200 -threshold 10
          
          Show
          andrew.wang Andrew Wang added a comment - Hi Daryn, I talked to our people who help run this large customer cluster. It's at about 350 million blocks, so a pretty good size, but also a lot denser than the last published stats I saw about the 4500-node Yahoo cluster. We don't have historical GC metrics going back a year when we put this into CDH, but they haven't seen anything abnormal in terms of GC. They were quite interested your balancer settings though, since we haven't seen it stressing the NN. Could you provide the following? dfs.datanode.balance.bandwidthPerSec dfs.datanode.balance.max.concurrent.moves dfs.namenode.replication.work.multiplier.per.iteration dfs.namenode.replication.max-streams-hard-limit I believe we're running it with mostly default settings like this: hdfs balancer -Ddfs.datanode.balance.max.concurrent.moves=200 -threshold 10
          Hide
          andrew.wang Andrew Wang added a comment -

          One other question from our team, what's the typical block size on a Yahoo cluster?

          Show
          andrew.wang Andrew Wang added a comment - One other question from our team, what's the typical block size on a Yahoo cluster?

            People

            • Assignee:
              sfriberg Staffan Friberg
              Reporter:
              sfriberg Staffan Friberg
            • Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development