Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10301

BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.6.1
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Component/s: namenode
    • Labels:
      None

      Description

      When NameNode is busy a DataNode can timeout sending a block report. Then it sends the block report again. Then NameNode while process these two reports at the same time can interleave processing storages from different reports. This screws up the blockReportId field, which makes NameNode think that some storages are zombie. Replicas from zombie storages are immediately removed, causing missing blocks.

      1. HDFS-10301.002.patch
        11 kB
        Colin P. McCabe
      2. HDFS-10301.003.patch
        16 kB
        Colin P. McCabe
      3. HDFS-10301.004.patch
        29 kB
        Vinitha Reddy Gankidi
      4. HDFS-10301.005.patch
        16 kB
        Colin P. McCabe
      5. HDFS-10301.006.patch
        31 kB
        Vinitha Reddy Gankidi
      6. HDFS-10301.007.patch
        32 kB
        Vinitha Reddy Gankidi
      7. HDFS-10301.008.patch
        32 kB
        Vinitha Reddy Gankidi
      8. HDFS-10301.009.patch
        32 kB
        Vinitha Reddy Gankidi
      9. HDFS-10301.01.patch
        16 kB
        Walter Su
      10. HDFS-10301.010.patch
        32 kB
        Vinitha Reddy Gankidi
      11. HDFS-10301.011.patch
        32 kB
        Vinitha Reddy Gankidi
      12. HDFS-10301.012.patch
        32 kB
        Vinitha Reddy Gankidi
      13. HDFS-10301.013.patch
        38 kB
        Vinitha Reddy Gankidi
      14. HDFS-10301.014.patch
        25 kB
        Vinitha Reddy Gankidi
      15. HDFS-10301.015.patch
        26 kB
        Vinitha Reddy Gankidi
      16. HDFS-10301.branch-2.015.patch
        24 kB
        Konstantin Shvachko
      17. HDFS-10301.branch-2.7.015.patch
        20 kB
        Vinitha Reddy Gankidi
      18. HDFS-10301.branch-2.7.patch
        28 kB
        Vinitha Reddy Gankidi
      19. HDFS-10301.branch-2.patch
        30 kB
        Konstantin Shvachko
      20. HDFS-10301.sample.patch
        4 kB
        Daryn Sharp
      21. zombieStorageLogs.rtf
        32 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Hide
          shv Konstantin Shvachko added a comment -

          More details.

          1. My DataNode has 6 storages. It sends a block report and times out, then it sends the same block report five more times with different blockReportIds.
          2. The NameNode starts executing all six reports around the same time, and interleaves them, that is it processes the first storage of BR2 before it process the last storage of BR1. (Color coded logs are coming)
          3. While processing storages from BR2 NameNode changes the lastBlockReportId field to the id of BR2. This messes with processing storages from BR1, which have not been processed yet. Namely these storages are considered zombie, and all replicas are removed from those storages along with the storage itself.
          4. The storage is then reconstructed by the NameNode when it receives a heartbeat from the DataNode, but this storage is marked as "stale", but the replicas will not be reconstructed until the next block report, which in my case is a few hours later.
          5. I noticed missing blocks because several DataNodes exhibited the same behavior and all replicas of the same block were lost.
          6. The replicas eventually reappeared (several hours later), because DataNodes do not physically remove the replicas and report them in the next block report.

          The behavior was introduced by HDFS-7960 as a part of hot-swap feature. I did not do hot-swap, and did not failover the NameNode.

          Show
          shv Konstantin Shvachko added a comment - More details. My DataNode has 6 storages. It sends a block report and times out, then it sends the same block report five more times with different blockReportIds. The NameNode starts executing all six reports around the same time, and interleaves them, that is it processes the first storage of BR2 before it process the last storage of BR1. (Color coded logs are coming) While processing storages from BR2 NameNode changes the lastBlockReportId field to the id of BR2. This messes with processing storages from BR1, which have not been processed yet. Namely these storages are considered zombie, and all replicas are removed from those storages along with the storage itself. The storage is then reconstructed by the NameNode when it receives a heartbeat from the DataNode, but this storage is marked as "stale", but the replicas will not be reconstructed until the next block report, which in my case is a few hours later. I noticed missing blocks because several DataNodes exhibited the same behavior and all replicas of the same block were lost. The replicas eventually reappeared (several hours later), because DataNodes do not physically remove the replicas and report them in the next block report. The behavior was introduced by HDFS-7960 as a part of hot-swap feature. I did not do hot-swap, and did not failover the NameNode.
          Hide
          shv Konstantin Shvachko added a comment -

          My DN has the following six storages:

          DS-019298c0-aab9-45b4-8b62-95d6809380ff:NORMAL:kkk.sss.22.105
          DS-0ea95238-d9ba-4f62-ae18-fdb9333465ce:NORMAL:kkk.sss.22.105
          DS-191fc04b-90be-42c9-b6fb-fdd1517bf4c7:NORMAL:kkk.sss.22.105
          DS-4a2e91c7-cdf0-408b-83a6-286c3534d673:NORMAL:kkk.sss.22.105
          DS-5b2941f7-2b52-45a8-b135-dcbe488cc65b:NORMAL:kkk.sss.22.105
          DS-6849f605-fd83-462d-97c3-cb6949383f7e:NORMAL:kkk.sss.22.105
          

          Here are the logs for its block reports. All throw the same exception, but I pasted it only once.

          2016-04-12 22:31:58,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d25423fb64d,  containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 19 msec to generate and 60078 msecs for RPC and NN processing. Got back no commands.
          2016-04-12 22:31:58,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
          java.net.SocketTimeoutException: Call From dn-hcl1264.my.cluster.com/kkk.sss.22.105 to namenode-ha1.my.cluster.com:9000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
                  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
                  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
                  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
                  at org.apache.hadoop.ipc.Client.call(Client.java:1473)
                  at org.apache.hadoop.ipc.Client.call(Client.java:1400)
                  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
                  at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
                  at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:178)
                  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:494)
                  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:732)
                  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:872)
                  at java.lang.Thread.run(Thread.java:745)
          Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]
                  at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
                  at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
                  at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
                  at java.io.FilterInputStream.read(FilterInputStream.java:133)
                  at java.io.FilterInputStream.read(FilterInputStream.java:133)
                  at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
                  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
                  at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
                  at java.io.DataInputStream.readInt(DataInputStream.java:387)
                  at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
                  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
          
          2016-04-12 22:32:59,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d334a100bde,  containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 17 msec to generate and 60066 msecs for RPC and NN processing. Got back no commands.
          2016-04-12 22:33:59,311 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d414ae386b2,  containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 16 msec to generate and 60055 msecs for RPC and NN processing. Got back no commands.
          2016-04-12 22:34:59,409 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d4f4a605732,  containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 16 msec to generate and 60032 msecs for RPC and NN processing. Got back no commands.
          2016-04-12 22:35:59,585 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d5d4ca9bf5c,  containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 15 msec to generate and 60040 msecs for RPC and NN processing. Got back no commands.
          2016-04-12 22:36:47,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x283d6b4ac1b50a,  containing 6 storage report(s), of which we sent 6. The reports had 81565 total blocks and used 1 RPC(s). This took 17 msec to generate and 47664 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
          

          I'll attache the logs for processing these six block reports on the NameNode. Each color represents single report. You can see how the colors are interleaving, and zombie storage messages in the middle.

          Show
          shv Konstantin Shvachko added a comment - My DN has the following six storages: DS-019298c0-aab9-45b4-8b62-95d6809380ff:NORMAL:kkk.sss.22.105 DS-0ea95238-d9ba-4f62-ae18-fdb9333465ce:NORMAL:kkk.sss.22.105 DS-191fc04b-90be-42c9-b6fb-fdd1517bf4c7:NORMAL:kkk.sss.22.105 DS-4a2e91c7-cdf0-408b-83a6-286c3534d673:NORMAL:kkk.sss.22.105 DS-5b2941f7-2b52-45a8-b135-dcbe488cc65b:NORMAL:kkk.sss.22.105 DS-6849f605-fd83-462d-97c3-cb6949383f7e:NORMAL:kkk.sss.22.105 Here are the logs for its block reports. All throw the same exception, but I pasted it only once. 2016-04-12 22:31:58,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d25423fb64d, containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 19 msec to generate and 60078 msecs for RPC and NN processing. Got back no commands. 2016-04-12 22:31:58,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From dn-hcl1264.my.cluster.com/kkk.sss.22.105 to namenode-ha1.my.cluster.com:9000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]; For more details see: http: //wiki.apache.org/hadoop/SocketTimeout at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750) at org.apache.hadoop.ipc.Client.call(Client.java:1473) at org.apache.hadoop.ipc.Client.call(Client.java:1400) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy12.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:178) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:494) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:732) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:872) at java.lang. Thread .run( Thread .java:745) Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 remote=namenode-ha1.my.cluster.com/10.150.1.56:9000] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) 2016-04-12 22:32:59,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d334a100bde, containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 17 msec to generate and 60066 msecs for RPC and NN processing. Got back no commands. 2016-04-12 22:33:59,311 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d414ae386b2, containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 16 msec to generate and 60055 msecs for RPC and NN processing. Got back no commands. 2016-04-12 22:34:59,409 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d4f4a605732, containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 16 msec to generate and 60032 msecs for RPC and NN processing. Got back no commands. 2016-04-12 22:35:59,585 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x283d5d4ca9bf5c, containing 6 storage report(s), of which we sent 0. The reports had 81565 total blocks and used 0 RPC(s). This took 15 msec to generate and 60040 msecs for RPC and NN processing. Got back no commands. 2016-04-12 22:36:47,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x283d6b4ac1b50a, containing 6 storage report(s), of which we sent 6. The reports had 81565 total blocks and used 1 RPC(s). This took 17 msec to generate and 47664 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5. I'll attache the logs for processing these six block reports on the NameNode. Each color represents single report. You can see how the colors are interleaving, and zombie storage messages in the middle.
          Hide
          daryn Daryn Sharp added a comment -

          Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring.

          Show
          daryn Daryn Sharp added a comment - Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Daryn, not sure how HDFS-9198 eliminates it from occurring. DataNodes are still waiting for NN to process each BR, so they can timeout and send the same block report multiple times. On the NN side, BR ops processing is multi-threaded, so it can still interleave processing storages from different reports. Could you please clarify, what am I missing?

          Show
          shv Konstantin Shvachko added a comment - Hey Daryn, not sure how HDFS-9198 eliminates it from occurring. DataNodes are still waiting for NN to process each BR, so they can timeout and send the same block report multiple times. On the NN side, BR ops processing is multi-threaded, so it can still interleave processing storages from different reports. Could you please clarify, what am I missing?
          Hide
          walter.k.su Walter Su added a comment -

          1. IPC reader is single-thread by default. If it's multi-threaded, The order of putting rpc requests into callQueue is unspecified.
          1. IPC callQueue is fifo.
          2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, the entry order depends on the fairness of the lock.

          When constructed as fair, threads contend for entry using an approximately arrival-order policy. When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock... (quore from https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html)

          I think if DN can't get acked from NN, it shouldn't assume the arrival/processing order(esp when reestablish a connection). Well, I'm still curious about how the interleave happened. Any thoughts?

          Show
          walter.k.su Walter Su added a comment - 1. IPC reader is single-thread by default. If it's multi-threaded, The order of putting rpc requests into callQueue is unspecified. 1. IPC callQueue is fifo. 2. IPC Handler is multi-threaded. If 2 handlers are both waiting the fsn lock, the entry order depends on the fairness of the lock. When constructed as fair, threads contend for entry using an approximately arrival-order policy. When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock... (quore from https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html ) I think if DN can't get acked from NN, it shouldn't assume the arrival/processing order(esp when reestablish a connection). Well, I'm still curious about how the interleave happened. Any thoughts?
          Hide
          walter.k.su Walter Su added a comment -

          Oh, I see. In this case, the reports are not splitted. And because the for-loop is outside the lock, the 2 for-loops interleaved.

          for (int r = 0; r < reports.length; r++) {
          
          Show
          walter.k.su Walter Su added a comment - Oh, I see. In this case, the reports are not splitted. And because the for-loop is outside the lock, the 2 for-loops interleaved. for ( int r = 0; r < reports.length; r++) {
          Hide
          walter.k.su Walter Su added a comment -

          Upload a patch. Kindly review.

          Show
          walter.k.su Walter Su added a comment - Upload a patch. Kindly review.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 6m 29s trunk passed
          +1 compile 0m 38s trunk passed with JDK v1.8.0_77
          +1 compile 0m 41s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 51s trunk passed
          +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77
          +1 javadoc 1m 43s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_77
          -1 javac 6m 15s hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77 with JDK v1.8.0_77 generated 1 new + 32 unchanged - 1 fixed = 33 total (was 33)
          +1 javac 0m 38s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_95
          -1 javac 6m 54s hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 34 unchanged - 1 fixed = 35 total (was 35)
          +1 javac 0m 39s the patch passed
          +1 checkstyle 0m 22s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          -1 findbugs 2m 15s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77
          +1 javadoc 1m 46s the patch passed with JDK v1.7.0_95
          -1 unit 55m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 52m 52s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          133m 26s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            Synchronization performed on java.util.concurrent.ArrayBlockingQueue in org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.enqueue(List) At BlockManager.java:org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.enqueue(List) At BlockManager.java:[line 4495]
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
            hadoop.hdfs.TestSafeMode
            hadoop.hdfs.server.namenode.TestNamenodeRetryCache
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
            hadoop.hdfs.server.namenode.TestNamenodeRetryCache
            hadoop.hdfs.TestHFlush
            hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12799446/HDFS-10301.01.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7a855cb48a3c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / be0bce1
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          javac hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77: https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          javac hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95: https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15194/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15194/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 29s trunk passed +1 compile 0m 38s trunk passed with JDK v1.8.0_77 +1 compile 0m 41s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 43s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_77 -1 javac 6m 15s hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77 with JDK v1.8.0_77 generated 1 new + 32 unchanged - 1 fixed = 33 total (was 33) +1 javac 0m 38s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_95 -1 javac 6m 54s hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95 with JDK v1.7.0_95 generated 1 new + 34 unchanged - 1 fixed = 35 total (was 35) +1 javac 0m 39s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 findbugs 2m 15s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 46s the patch passed with JDK v1.7.0_95 -1 unit 55m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 52m 52s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 133m 26s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   Synchronization performed on java.util.concurrent.ArrayBlockingQueue in org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.enqueue(List) At BlockManager.java:org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.enqueue(List) At BlockManager.java: [line 4495] JDK v1.8.0_77 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA   hadoop.hdfs.TestSafeMode   hadoop.hdfs.server.namenode.TestNamenodeRetryCache   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl JDK v1.7.0_95 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA   hadoop.hdfs.server.namenode.TestNamenodeRetryCache   hadoop.hdfs.TestHFlush   hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12799446/HDFS-10301.01.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7a855cb48a3c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / be0bce1 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 javac hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77: https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt javac hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95: https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15194/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15194/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15194/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Thanks for the bug report. This is a tricky one.

          One small correction-- HDFS-7960 was not introduced as part of DataNode hotswap. It was originally introduced to solve issues caused by HDFS-7575, although it fixed issues with hotswap as well.

          It seems like we should be able to remove existing DataNode storage report RPCs with the old ID from the queue when we receive one with a new block report ID. This would also avoid a possible congestion collapse scenario caused by repeated retransmissions after the timeout.

          Show
          cmccabe Colin P. McCabe added a comment - - edited Thanks for the bug report. This is a tricky one. One small correction-- HDFS-7960 was not introduced as part of DataNode hotswap. It was originally introduced to solve issues caused by HDFS-7575 , although it fixed issues with hotswap as well. It seems like we should be able to remove existing DataNode storage report RPCs with the old ID from the queue when we receive one with a new block report ID. This would also avoid a possible congestion collapse scenario caused by repeated retransmissions after the timeout.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Walter, your patch looks good by itself, but it does not address the bug in the zombie storage recognition.
          Took me some time to review your patch, would have been easier if you explained your approach.
          So your patch is reordering block reports for different storages in such a way that storages from the same report are placed as a contiguous segment in the block report queue, so that processing of different BRs is not interleaved. This addresses Daryn's comment rather than solving the reported bug, as BTW Daryn correctly stated.
          If you want to go forward with reordering of BRs you should probably do it in another issue. I personally am not a supporter because

          1. It introduces an unnecessary restriction on the order of execution of block reports, and
          2. adds even more complexity to BR processing logic.

          I see the main problem here that block reports used to be idempotent per storage, but HDFS-7960 made execution of a subsequent storage dependent on the state produced during execution of the previous ones. I think idempotent is good, and we should keep it. I think we can mitigate the problem by one of the following

          1. Changing the criteria of zombie storage recognition. Why should it depend on block report IDs?
          2. Eliminating the notion of zombie storage altogether. E.g., NN can DN to run DirectoryScanner if NN thinks DN's state is outdated.
          3. Try to move curBlockReportId from DatanodeDescriptor to StorageInfo, which will eliminate global state between storages.

          Also if we cannot come up with a quick solution, then we should probably roll back HDFS-7960 for now and revisit it later, because this is a critical bug effecting all of our latest releases. And that is a lot of clusters and PBs out there.

          Show
          shv Konstantin Shvachko added a comment - Hey Walter, your patch looks good by itself, but it does not address the bug in the zombie storage recognition. Took me some time to review your patch, would have been easier if you explained your approach. So your patch is reordering block reports for different storages in such a way that storages from the same report are placed as a contiguous segment in the block report queue, so that processing of different BRs is not interleaved. This addresses Daryn's comment rather than solving the reported bug, as BTW Daryn correctly stated. If you want to go forward with reordering of BRs you should probably do it in another issue. I personally am not a supporter because It introduces an unnecessary restriction on the order of execution of block reports, and adds even more complexity to BR processing logic. I see the main problem here that block reports used to be idempotent per storage, but HDFS-7960 made execution of a subsequent storage dependent on the state produced during execution of the previous ones. I think idempotent is good, and we should keep it. I think we can mitigate the problem by one of the following Changing the criteria of zombie storage recognition. Why should it depend on block report IDs? Eliminating the notion of zombie storage altogether. E.g., NN can DN to run DirectoryScanner if NN thinks DN's state is outdated. Try to move curBlockReportId from DatanodeDescriptor to StorageInfo , which will eliminate global state between storages. Also if we cannot come up with a quick solution, then we should probably roll back HDFS-7960 for now and revisit it later, because this is a critical bug effecting all of our latest releases. And that is a lot of clusters and PBs out there.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hmm. This is a challenging one. Walter Su, I think I agree that the queue added in HDFS-9198 might be part of the problem here. In CDH, we haven't yet backported the deferred queuing stuff implemented in HDFS-9198, which might explain why we never saw this. Since we don't have a queue, and since NN RPCs are almost always handled in the order they arrive, CDH5 doesn't implement "reordering" of resent storage reports.

          Independently of this bug, I do think it's concerning that the DN keeps piling on retransmissions of FBRs even before the old ones were processed and acknowledged. This kind of behavior will obviously lead to congestion collapse if congestion is what caused the original FBRs to be processed but not acknowledged.

              void enqueue(List<Runnable> actions) throws InterruptedException {
                synchronized (queue) {
                  for (Runnable action : actions) {
                    if (!queue.offer(action)) {
                      if (!isAlive() && namesystem.isRunning()) {
                        ExitUtil.terminate(1, getName() + " is not running");
                      }
                      long now = Time.monotonicNow();
                      if (now - lastFull > 4000) {
                        lastFull = now;
                        LOG.info("Block report queue is full");
                      }
                      queue.put(action);
                    }
                  }
                }
              }
            }
          

          This is going to be problematic when contention gets high, because threads will spend a long time waiting to enter the synchronized (queue) section. And this will not be logged or reflected back to the admin in any way. Unfortunately, the operation that you want here, the ability to atomically add a bunch of items to the BlockingQueue, simply is not provided by BlockingQueue. The solution also seems somewhat brittle since reordering could happen because of network issues in a multi-RPC BlockReport.

          I'm thinking about this a little more, and it seems like the root of the problem is that in the single-RPC case, we're throwing away the information about how many storages were in the original report. We need to find a way to include that information in there...

          Show
          cmccabe Colin P. McCabe added a comment - Hmm. This is a challenging one. Walter Su , I think I agree that the queue added in HDFS-9198 might be part of the problem here. In CDH, we haven't yet backported the deferred queuing stuff implemented in HDFS-9198 , which might explain why we never saw this. Since we don't have a queue, and since NN RPCs are almost always handled in the order they arrive, CDH5 doesn't implement "reordering" of resent storage reports. Independently of this bug, I do think it's concerning that the DN keeps piling on retransmissions of FBRs even before the old ones were processed and acknowledged. This kind of behavior will obviously lead to congestion collapse if congestion is what caused the original FBRs to be processed but not acknowledged. void enqueue(List< Runnable > actions) throws InterruptedException { synchronized (queue) { for ( Runnable action : actions) { if (!queue.offer(action)) { if (!isAlive() && namesystem.isRunning()) { ExitUtil.terminate(1, getName() + " is not running" ); } long now = Time.monotonicNow(); if (now - lastFull > 4000) { lastFull = now; LOG.info( "Block report queue is full" ); } queue.put(action); } } } } } This is going to be problematic when contention gets high, because threads will spend a long time waiting to enter the synchronized (queue) section. And this will not be logged or reflected back to the admin in any way. Unfortunately, the operation that you want here, the ability to atomically add a bunch of items to the BlockingQueue , simply is not provided by BlockingQueue . The solution also seems somewhat brittle since reordering could happen because of network issues in a multi-RPC BlockReport. I'm thinking about this a little more, and it seems like the root of the problem is that in the single-RPC case, we're throwing away the information about how many storages were in the original report. We need to find a way to include that information in there...
          Hide
          cmccabe Colin P. McCabe added a comment -

          I have posted a new patch, which I posted as HDFS-10301.002.patch. The idea here is that we know the number of storage reports we expect to see in the block report. We should not be removing any storages as zombies unless we have seen this number of storages and marked these storages with the ID of the latest block report.

          I feel that this approach is better than the one used in 001.patch, since it correctly handles the "interleaved" case. It is very difficult to prove that we can never get interleaved storage reports for the DataNode. This is because of issues like queuing inside the RPCs system, packets getting reordered or delayed by the network, and queuing inside the deferred work mechanism added by HDFS-9198. So we should handle this case correctly.

          Show
          cmccabe Colin P. McCabe added a comment - I have posted a new patch, which I posted as HDFS-10301 .002.patch. The idea here is that we know the number of storage reports we expect to see in the block report. We should not be removing any storages as zombies unless we have seen this number of storages and marked these storages with the ID of the latest block report. I feel that this approach is better than the one used in 001.patch, since it correctly handles the "interleaved" case. It is very difficult to prove that we can never get interleaved storage reports for the DataNode. This is because of issues like queuing inside the RPCs system, packets getting reordered or delayed by the network, and queuing inside the deferred work mechanism added by HDFS-9198 . So we should handle this case correctly.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 39s trunk passed
          +1 compile 0m 43s trunk passed with JDK v1.8.0_77
          +1 compile 0m 41s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 7s trunk passed with JDK v1.8.0_77
          +1 javadoc 1m 51s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 37s the patch passed with JDK v1.8.0_77
          +1 javac 0m 37s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.7.0_95
          +1 javac 0m 38s the patch passed
          -1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 217 unchanged - 0 fixed = 220 total (was 217)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 10s the patch passed
          +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77
          +1 javadoc 1m 45s the patch passed with JDK v1.7.0_95
          -1 unit 60m 10s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 59m 18s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 23s Patch does not generate ASF License warnings.
          144m 44s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.server.datanode.TestFsDatasetCache



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800061/HDFS-10301.002.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 92e06b70f89b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a749ba0
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15245/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15245/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 39s trunk passed +1 compile 0m 43s trunk passed with JDK v1.8.0_77 +1 compile 0m 41s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 7s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 51s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 45s the patch passed +1 compile 0m 37s the patch passed with JDK v1.8.0_77 +1 javac 0m 37s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed -1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 217 unchanged - 0 fixed = 220 total (was 217) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 10s the patch passed +1 javadoc 1m 4s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 45s the patch passed with JDK v1.7.0_95 -1 unit 60m 10s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 59m 18s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 144m 44s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache JDK v1.7.0_95 Failed junit tests hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.server.datanode.TestFsDatasetCache Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800061/HDFS-10301.002.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 92e06b70f89b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a749ba0 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15245/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15245/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          added a unit test

          Show
          cmccabe Colin P. McCabe added a comment - added a unit test
          Hide
          walter.k.su Walter Su added a comment -

          Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring.

          This addresses Daryn's comment rather than solving the reported bug, as BTW Daryn correctly stated.

          that's incorrect. Please run the test in 001 patch with-and-without the fix, you'll see the difference. It does solve the issue. Because,

          The bug only exists when reports are contained in one rpc. If they are splitted into multiple RPCs, it's not problem, because the rpcsSeen guard prevent it from happening. So, my approach is to process reports contained in one rpc contiguously, by putting them into the queue atomically.

          Show
          walter.k.su Walter Su added a comment - Enabling HDFS-9198 will fifo process BRs. It doesn't solve this implementation bug but virtually eliminates it from occurring. This addresses Daryn's comment rather than solving the reported bug, as BTW Daryn correctly stated. that's incorrect. Please run the test in 001 patch with-and-without the fix, you'll see the difference. It does solve the issue. Because, The bug only exists when reports are contained in one rpc. If they are splitted into multiple RPCs, it's not problem, because the rpcsSeen guard prevent it from happening. So, my approach is to process reports contained in one rpc contiguously, by putting them into the queue atomically.
          Hide
          walter.k.su Walter Su added a comment -

          The handler threads will wait anyway, either waiting the queue monitor or the fsn writeLock. The queue processingThread will contend for fsn writeLock. In the end, there's no difference.

          Show
          walter.k.su Walter Su added a comment - The handler threads will wait anyway, either waiting the queue monitor or the fsn writeLock. The queue processingThread will contend for fsn writeLock. In the end, there's no difference.
          Hide
          walter.k.su Walter Su added a comment -

          I like your idea of counting storages with same reportId, and no purge if there's any interleaving. I guest rpcsSeen can be removed or replaced by storagesSeen?

          Processing the retransmissioned reports is kind of wasting resource. I think the best approach is as Colin said, "to remove existing DataNode storage report RPCs with the old ID from the queue when we receive one with a new block report ID." Let's consider it as an optimization in another jira.

          Show
          walter.k.su Walter Su added a comment - I like your idea of counting storages with same reportId, and no purge if there's any interleaving. I guest rpcsSeen can be removed or replaced by storagesSeen ? Processing the retransmissioned reports is kind of wasting resource. I think the best approach is as Colin said, "to remove existing DataNode storage report RPCs with the old ID from the queue when we receive one with a new block report ID." Let's consider it as an optimization in another jira.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 8m 52s trunk passed
          +1 compile 1m 9s trunk passed with JDK v1.8.0_77
          +1 compile 0m 54s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 1m 10s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 2m 22s trunk passed
          +1 javadoc 1m 27s trunk passed with JDK v1.8.0_77
          +1 javadoc 2m 26s trunk passed with JDK v1.7.0_95
          +1 mvninstall 1m 1s the patch passed
          +1 compile 1m 0s the patch passed with JDK v1.8.0_77
          +1 javac 1m 0s the patch passed
          +1 compile 0m 50s the patch passed with JDK v1.7.0_95
          +1 javac 0m 50s the patch passed
          -1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 217 unchanged - 0 fixed = 220 total (was 217)
          +1 mvnsite 1m 3s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 27s the patch passed
          +1 javadoc 1m 26s the patch passed with JDK v1.8.0_77
          +1 javadoc 2m 3s the patch passed with JDK v1.7.0_95
          -1 unit 94m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 90m 50s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 26s Patch does not generate ASF License warnings.
          218m 14s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage
            hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.TestFileCreationDelete
            hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.server.datanode.TestDirectoryScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800128/HDFS-10301.003.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e43756b29eca 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 14ab7a8
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15249/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15249/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 52s trunk passed +1 compile 1m 9s trunk passed with JDK v1.8.0_77 +1 compile 0m 54s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 25s trunk passed +1 mvnsite 1m 10s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 2m 22s trunk passed +1 javadoc 1m 27s trunk passed with JDK v1.8.0_77 +1 javadoc 2m 26s trunk passed with JDK v1.7.0_95 +1 mvninstall 1m 1s the patch passed +1 compile 1m 0s the patch passed with JDK v1.8.0_77 +1 javac 1m 0s the patch passed +1 compile 0m 50s the patch passed with JDK v1.7.0_95 +1 javac 0m 50s the patch passed -1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 217 unchanged - 0 fixed = 220 total (was 217) +1 mvnsite 1m 3s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 27s the patch passed +1 javadoc 1m 26s the patch passed with JDK v1.8.0_77 +1 javadoc 2m 3s the patch passed with JDK v1.7.0_95 -1 unit 94m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 90m 50s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 26s Patch does not generate ASF License warnings. 218m 14s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.TestFileCreationDelete   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12800128/HDFS-10301.003.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e43756b29eca 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 14ab7a8 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15249/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15249/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15249/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Yeah, perhaps we should file a follow-on JIRA to optimize by removing the storage reports with an older ID when a newer one was received. The challenge will be implementing it efficiently-- we probably need to move away from BlockingQueue and towards something with our own locking. And probably something other than plain Runnables.

          Show
          cmccabe Colin P. McCabe added a comment - Yeah, perhaps we should file a follow-on JIRA to optimize by removing the storage reports with an older ID when a newer one was received. The challenge will be implementing it efficiently-- we probably need to move away from BlockingQueue and towards something with our own locking. And probably something other than plain Runnables.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Walter Su, sorry let me rephrase my wording.
          Your patch "eliminates" current behavior, but it doesn't directly address the "implementation bug" in zombie storage detection.
          So your patch is correct and you did solve the problem, but indirectly, by reordering reports from the same RPC into a contiguous segment, which in a sense validates the zombie storage bug.
          Also you add a stronger requirement for block report processing, that the entire block-report-PRCs must be processed in the order they are received, vs. current requirement that individual storage reports must be processed in the order received. The latter allows interleaving, while yours does not, and your unit test enforces the new order. I am in favor of weaker requirements when possible.
          Hope this makes more sense.

          Show
          shv Konstantin Shvachko added a comment - Hey Walter Su , sorry let me rephrase my wording. Your patch "eliminates" current behavior, but it doesn't directly address the "implementation bug" in zombie storage detection. So your patch is correct and you did solve the problem, but indirectly, by reordering reports from the same RPC into a contiguous segment, which in a sense validates the zombie storage bug. Also you add a stronger requirement for block report processing, that the entire block-report-PRCs must be processed in the order they are received, vs. current requirement that individual storage reports must be processed in the order received. The latter allows interleaving, while yours does not, and your unit test enforces the new order. I am in favor of weaker requirements when possible. Hope this makes more sense.
          Hide
          shv Konstantin Shvachko added a comment -

          Colin P. McCabe I don't think that HDFS-9198 breaks this issue. I am seeing this in 2.6.1, which also doesn't have the queues. You may not observe this behavior until you put substantial write load on the NameNode. Looking at your patch now.

          Show
          shv Konstantin Shvachko added a comment - Colin P. McCabe I don't think that HDFS-9198 breaks this issue. I am seeing this in 2.6.1, which also doesn't have the queues. You may not observe this behavior until you put substantial write load on the NameNode. Looking at your patch now.
          Hide
          jingzhao Jing Zhao added a comment -

          As an optimization, maybe we can have a retry-cache similar mechanism on the NN for block reports? We can let the retry of the same FBR share the same block report ID, and based on the lease ID and the report ID the NN can recognize the retry. Then instead of always putting the detailed reports into the queue, we can either directly return the existing response (if the first request has been processed), or let the retry wait till the current on-going FBR process finishes.

          Show
          jingzhao Jing Zhao added a comment - As an optimization, maybe we can have a retry-cache similar mechanism on the NN for block reports? We can let the retry of the same FBR share the same block report ID, and based on the lease ID and the report ID the NN can recognize the retry. Then instead of always putting the detailed reports into the queue, we can either directly return the existing response (if the first request has been processed), or let the retry wait till the current on-going FBR process finishes.
          Hide
          shv Konstantin Shvachko added a comment -

          Colin I ran your unit test and verified that it fails on current code base, but succeeds with your patch.
          Looking at the patch. Counting storagesSeen does work for your test case. But it is somewhat confusing, as the count is changing with interleaving reports.
          Suppose you have 3 storages (s1, s2, s3) and two block reports br1, br2 interleaving in the following way:

          reoprtId-strorage storagesSeen
          br1-s1 0
          br1-s2 1
          br2-s1 0
          br2-s2 1
          br1-s3 0

          The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 .
          This brought me to an idea. BR ids are monotonically increasing. What if in BlockManager.processReport() (before processing but under the lock) we check lastBlockReportId for all storages, and if we see one greater than context.getReportId() we through an IOExcpetion indicating that the next block report is in progress and we do not need to continue with this one. The exception is not expected to be passed back to the DataNode, as it has already timed out, but even if it gets passed, the DataNode will just send another block report.
          I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas.

          Show
          shv Konstantin Shvachko added a comment - Colin I ran your unit test and verified that it fails on current code base, but succeeds with your patch. Looking at the patch. Counting storagesSeen does work for your test case. But it is somewhat confusing, as the count is changing with interleaving reports. Suppose you have 3 storages (s1, s2, s3) and two block reports br1, br2 interleaving in the following way: reoprtId-strorage storagesSeen br1-s1 0 br1-s2 1 br2-s1 0 br2-s2 1 br1-s3 0 The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 . This brought me to an idea. BR ids are monotonically increasing. What if in BlockManager.processReport() (before processing but under the lock) we check lastBlockReportId for all storages, and if we see one greater than context.getReportId() we through an IOExcpetion indicating that the next block report is in progress and we do not need to continue with this one. The exception is not expected to be passed back to the DataNode, as it has already timed out, but even if it gets passed, the DataNode will just send another block report. I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas.
          Hide
          walter.k.su Walter Su added a comment -

          Thank you for your explanation. I learned a lot.

          Show
          walter.k.su Walter Su added a comment - Thank you for your explanation. I learned a lot.
          Hide
          walter.k.su Walter Su added a comment -

          BR ids are monotonically increasing.

          The id values are random intially, if it starts with a large value it could overflow after a long run? If DN restarts, the value randomized again. We should be careful in case NN rejects all following BRs.
          If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira.

          Show
          walter.k.su Walter Su added a comment - BR ids are monotonically increasing. The id values are random intially, if it starts with a large value it could overflow after a long run? If DN restarts, the value randomized again. We should be careful in case NN rejects all following BRs. If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Konstantin Shvachko wrote: The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 .

          It's OK for it to be 0 here. It just means that we will not do the zombie storage elimination for these particular full block reports. Remember that interleaved block reports are an extremely rare case, and so are zombie storages. We can wait for the next FBR to do the zombie elimination.

          I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas.

          The problem with a retry cache is that it uses up memory. We don't have an easy way to put an upper bound on the amount of memory that we need, except through adding complex logic to limit the number of full block reports accepted for a specific DataNode in a given time period.

          This brought me to an idea. BR ids are monotonically increasing...

          The code for generating block report IDs is here:

            private long generateUniqueBlockReportId() {
              // Initialize the block report ID the first time through.
              // Note that 0 is used on the NN to indicate "uninitialized", so we should
              // not send a 0 value ourselves.
              prevBlockReportId++;
              while (prevBlockReportId == 0) {
                prevBlockReportId = ThreadLocalRandom.current().nextLong();
              }     
              return prevBlockReportId;
            } 
          

          It's not monotonically increasing in the case where rollover occurs. While this is an extremely rare case, the consequences of getting it wrong would be extremely severe. So this might be possible as an incompatible change, but not a change in branch-2.

          Edit: another reason not to do this is because on restart, the DN could get a number lower than its previous one. We can't use IDs as epoch numbers unless we actually persist them to disk, like Paxos transaction IDs or HDFS edit log IDs.

          Walter Su wrote: If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira.

          We don't document anywhere that interleaving doesn't occur. We don't have unit tests that it doesn't occur, and if we did, those unit tests might accidentally pass because of race conditions. Even if we eliminated interleaving for now, anyone changing the RPC code or the queuing code could easily re-introduce interleaving and this bug would come back. That's why I agree with Konstantin Shvachko-- we should not focus on trying to remove interleaving.

          Konstantin Shvachko wrote: I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue.

          Yeah, let's get in this fix and then talk about potential improvements in a follow-on jira.

          Show
          cmccabe Colin P. McCabe added a comment - - edited Konstantin Shvachko wrote: The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 . It's OK for it to be 0 here. It just means that we will not do the zombie storage elimination for these particular full block reports. Remember that interleaved block reports are an extremely rare case, and so are zombie storages. We can wait for the next FBR to do the zombie elimination. I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas. The problem with a retry cache is that it uses up memory. We don't have an easy way to put an upper bound on the amount of memory that we need, except through adding complex logic to limit the number of full block reports accepted for a specific DataNode in a given time period. This brought me to an idea. BR ids are monotonically increasing... The code for generating block report IDs is here: private long generateUniqueBlockReportId() { // Initialize the block report ID the first time through. // Note that 0 is used on the NN to indicate "uninitialized" , so we should // not send a 0 value ourselves. prevBlockReportId++; while (prevBlockReportId == 0) { prevBlockReportId = ThreadLocalRandom.current().nextLong(); } return prevBlockReportId; } It's not monotonically increasing in the case where rollover occurs. While this is an extremely rare case, the consequences of getting it wrong would be extremely severe. So this might be possible as an incompatible change, but not a change in branch-2. Edit: another reason not to do this is because on restart, the DN could get a number lower than its previous one. We can't use IDs as epoch numbers unless we actually persist them to disk, like Paxos transaction IDs or HDFS edit log IDs. Walter Su wrote: If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira. We don't document anywhere that interleaving doesn't occur. We don't have unit tests that it doesn't occur, and if we did, those unit tests might accidentally pass because of race conditions. Even if we eliminated interleaving for now, anyone changing the RPC code or the queuing code could easily re-introduce interleaving and this bug would come back. That's why I agree with Konstantin Shvachko -- we should not focus on trying to remove interleaving. Konstantin Shvachko wrote: I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Yeah, let's get in this fix and then talk about potential improvements in a follow-on jira.
          Hide
          shv Konstantin Shvachko added a comment -

          Yes, on trunk generateUniqueBlockReportId() starts from random. I was looking at it in a previous version where it is still nanoTime().

          Remember that interleaved block reports are an extremely rare case

          Not really. On a busy cluster with a lot of blocks to report it happens all the time.

          Show
          shv Konstantin Shvachko added a comment - Yes, on trunk generateUniqueBlockReportId() starts from random. I was looking at it in a previous version where it is still nanoTime() . Remember that interleaved block reports are an extremely rare case Not really. On a busy cluster with a lot of blocks to report it happens all the time.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Colin, I reviewed your patch more thoroughly. There is still a problem with interleaving reports. See updateBlockReportContext(). Suppose that block reports interleave like this: <br1-s1, br2-s1, br1-s2, br2-s2>. Then br1-s2 will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in the report, which will discard the bit set for s1 in br2-s1, and the count of rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages will not be removed. LMK if you see what I see.

          Show
          shv Konstantin Shvachko added a comment - Hey Colin, I reviewed your patch more thoroughly. There is still a problem with interleaving reports. See updateBlockReportContext() . Suppose that block reports interleave like this: <br1-s1, br2-s1, br1-s2, br2-s2>. Then br1-s2 will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in the report, which will discard the bit set for s1 in br2-s1, and the count of rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages will not be removed. LMK if you see what I see.
          Hide
          shv Konstantin Shvachko added a comment -

          May be we should go with a different approach for this problem.

          • The problem. NameNode thinks that the reporting DN has the following set of storages <s1, s2, s3>. But the DataNode reports <s1, s2, s4>, because one of its drives was replaced, reformatted, or taken out of service. The NameNode should update the list of storages to the ones reported by the DataNode, potentially removing some of them.
          • Constraints. Single block report can be split into multiple RPCs. Within single block-report-RPC NameNode processes each storage under a lock, but then releases and re-acquires the lock for the next storage, so that multiple RPC reports can interleave due to multi-threading.
          • Approach. DN should report full list of its storages in the first block-report-RPC. The NameNode first cleans up unreported storages and replicas belonging them, then start processing the rest of block reports as usually.
            So DataNodes explicitly report storages that they have, which eliminates NameNode guessing, which storage is the last in the block report RPC.

          I did not look if any changes in the RPC message structure are needed, but it think that all necessary fields should be already present.

          Show
          shv Konstantin Shvachko added a comment - May be we should go with a different approach for this problem. The problem. NameNode thinks that the reporting DN has the following set of storages <s1, s2, s3>. But the DataNode reports <s1, s2, s4>, because one of its drives was replaced, reformatted, or taken out of service. The NameNode should update the list of storages to the ones reported by the DataNode, potentially removing some of them. Constraints. Single block report can be split into multiple RPCs. Within single block-report-RPC NameNode processes each storage under a lock, but then releases and re-acquires the lock for the next storage, so that multiple RPC reports can interleave due to multi-threading. Approach. DN should report full list of its storages in the first block-report-RPC. The NameNode first cleans up unreported storages and replicas belonging them, then start processing the rest of block reports as usually. So DataNodes explicitly report storages that they have, which eliminates NameNode guessing, which storage is the last in the block report RPC. I did not look if any changes in the RPC message structure are needed, but it think that all necessary fields should be already present.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hey Colin, I reviewed your patch more thoroughly. There is still a problem with interleaving reports. See updateBlockReportContext(). Suppose that block reports interleave like this: <br1-s1, br2-s1, br1-s2, br2-s2>. Then br1-s2 will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in the report, which will discard the bit set for s1 in br2-s1, and the count of rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages will not be removed. LMK if you see what I see.

          Thanks for looking at the patch. I agree that in the case of interleaving, zombie storages will not be removed. I don't consider that a problem, since we will eventually get a non-interleaved full block report that will do the zombie storage removal. In practice, interleaved block reports are extremely rare (we have never seen the problem described in this JIRA, after deploying to thousands of clusters).

          May be we should go with a different approach for this problem. Single block report can be split into multiple RPCs. Within single block-report-RPC NameNode processes each storage under a lock, but then releases and re-acquires the lock for the next storage, so that multiple RPC reports can interleave due to multi-threading.

          Maybe I'm misunderstanding the proposal, but don't we already do all of this? We split block reports into multiple RPCs when the storage reports grow beyond a certain size.

          Approach. DN should report full list of its storages in the first block-report-RPC. The NameNode first cleans up unreported storages and replicas belonging them, then start processing the rest of block reports as usually. So DataNodes explicitly report storages that they have, which eliminates NameNode guessing, which storage is the last in the block report RPC.

          What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck and not be able to accept any new reports. Or, you can take the same approach the current patch does, and clear the current state every time you see a new ID (but then you can't do zombie storage elimination in the presence of interleaving.)

          One approach that avoids all these problems is to avoid doing zombie storage elimination during FBRs entirely, and do it instead during DN heartbeats (for example). DN heartbeats are small messages that are never split, and their processing is not interleaved with anything.

          We agree that the current patch solves the problem of storages falsely being declared as zombies, I hope. I think that's a good enough reason to get this patch in, and then think about alternate approaches later.

          Show
          cmccabe Colin P. McCabe added a comment - Hey Colin, I reviewed your patch more thoroughly. There is still a problem with interleaving reports. See updateBlockReportContext(). Suppose that block reports interleave like this: <br1-s1, br2-s1, br1-s2, br2-s2>. Then br1-s2 will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in the report, which will discard the bit set for s1 in br2-s1, and the count of rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages will not be removed. LMK if you see what I see. Thanks for looking at the patch. I agree that in the case of interleaving, zombie storages will not be removed. I don't consider that a problem, since we will eventually get a non-interleaved full block report that will do the zombie storage removal. In practice, interleaved block reports are extremely rare (we have never seen the problem described in this JIRA, after deploying to thousands of clusters). May be we should go with a different approach for this problem. Single block report can be split into multiple RPCs. Within single block-report-RPC NameNode processes each storage under a lock, but then releases and re-acquires the lock for the next storage, so that multiple RPC reports can interleave due to multi-threading. Maybe I'm misunderstanding the proposal, but don't we already do all of this? We split block reports into multiple RPCs when the storage reports grow beyond a certain size. Approach. DN should report full list of its storages in the first block-report-RPC. The NameNode first cleans up unreported storages and replicas belonging them, then start processing the rest of block reports as usually. So DataNodes explicitly report storages that they have, which eliminates NameNode guessing, which storage is the last in the block report RPC. What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck and not be able to accept any new reports. Or, you can take the same approach the current patch does, and clear the current state every time you see a new ID (but then you can't do zombie storage elimination in the presence of interleaving.) One approach that avoids all these problems is to avoid doing zombie storage elimination during FBRs entirely, and do it instead during DN heartbeats (for example). DN heartbeats are small messages that are never split, and their processing is not interleaved with anything. We agree that the current patch solves the problem of storages falsely being declared as zombies, I hope. I think that's a good enough reason to get this patch in, and then think about alternate approaches later.
          Hide
          shv Konstantin Shvachko added a comment -

          Maybe I'm misunderstanding the proposal, but don't we already do all of this?

          Yes you misunderstood. This part is not my proposal. This is what we already do, and therefore I call them Constraints, because they complicate the Problem. The proposal is in the third bullet point titled Approach.

          What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck

          No, I will not get stuck. All br-RCPs are completely independent of each other. It's just that one of them has all storages, and indicates to the NameNode that it should update its storage list for the DataNode. NN processes as many of such RPCs, as DN sends. If the DN dies the NN will declare it dead in due time, or if DN restarts within 10 minutes it will send new set of block reports from scratch. I do not see any inconsistencies.

          You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat.

          interleaved block reports are extremely rare

          You keep saying this. But it is not rare for me. Are you convincing me not to believe my eyes or that you checked the logs on your thousands of clusters? I did check mine.

          Show
          shv Konstantin Shvachko added a comment - Maybe I'm misunderstanding the proposal, but don't we already do all of this ? Yes you misunderstood. This part is not my proposal. This is what we already do, and therefore I call them Constraints , because they complicate the Problem . The proposal is in the third bullet point titled Approach . What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck No, I will not get stuck. All br-RCPs are completely independent of each other. It's just that one of them has all storages, and indicates to the NameNode that it should update its storage list for the DataNode. NN processes as many of such RPCs, as DN sends. If the DN dies the NN will declare it dead in due time, or if DN restarts within 10 minutes it will send new set of block reports from scratch. I do not see any inconsistencies. You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat. interleaved block reports are extremely rare You keep saying this. But it is not rare for me. Are you convincing me not to believe my eyes or that you checked the logs on your thousands of clusters? I did check mine.
          Hide
          cmccabe Colin P. McCabe added a comment -

          You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat.

          Thanks for explaining your proposal a little bit more. I agree that enumerating all the storages in the first block report RPC is a fairly simple way to handle this, and shouldn't add too much size to the FBR. It seems like a better idea than adding it to the heartbeat, like I proposed. In the short term, however, I would prefer the current patch, since it involves no RPC changes, and doesn't require all the DataNodes to be upgraded before it can work.

          Show
          cmccabe Colin P. McCabe added a comment - You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat. Thanks for explaining your proposal a little bit more. I agree that enumerating all the storages in the first block report RPC is a fairly simple way to handle this, and shouldn't add too much size to the FBR. It seems like a better idea than adding it to the heartbeat, like I proposed. In the short term, however, I would prefer the current patch, since it involves no RPC changes, and doesn't require all the DataNodes to be upgraded before it can work.
          Hide
          daryn Daryn Sharp added a comment -

          Still catching up and need to review patch. First question, how is this interleaving happening on a frequent basis?

          An interesting observation (if I interpreted the logs correctly) is processing all 4 storages with ~14k blocks/storage appears to takes minutes to process? Tens of seconds appear to elapse between processing each storage. There's some serious contention that seems indicative of a nasty bug or suboptimal configuration exacerbating this bug.

          Is the DN rpc timeout set to something very low? Has the number of RPC handlers been greatly increased? Are there frequent deletes of massive trees? Is there a lot of decomm'ing with a low check interval?

          Show
          daryn Daryn Sharp added a comment - Still catching up and need to review patch. First question, how is this interleaving happening on a frequent basis? An interesting observation (if I interpreted the logs correctly) is processing all 4 storages with ~14k blocks/storage appears to takes minutes to process? Tens of seconds appear to elapse between processing each storage. There's some serious contention that seems indicative of a nasty bug or suboptimal configuration exacerbating this bug. Is the DN rpc timeout set to something very low? Has the number of RPC handlers been greatly increased? Are there frequent deletes of massive trees? Is there a lot of decomm'ing with a low check interval?
          Hide
          daryn Daryn Sharp added a comment -

          Here's a possibly simpler approach. Throw a retriable exception if there's already a report being processed. Completely untested, based on 2.7.

          Show
          daryn Daryn Sharp added a comment - Here's a possibly simpler approach. Throw a retriable exception if there's already a report being processed. Completely untested, based on 2.7.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 4s HDFS-10301 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12802218/HDFS-10301.sample.patch
          JIRA Issue HDFS-10301
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15359/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 4s HDFS-10301 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12802218/HDFS-10301.sample.patch JIRA Issue HDFS-10301 Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15359/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          In the short term, however, I would prefer the current patch, since it involves no RPC changes, and doesn't require all the DataNodes to be upgraded before it can work.

          • I don't think my approach requires RPC change, since the block-report RPC message already has all required structures in place. It should require only the processing logic change.
          • DataNodes will need to be upgraded indeed, but only in the case if they split its block-reports into multiple RPC, because full report lists all storages already. But even multi-RPC case it will only mean that zombie storages will not be removed until they are upgraded.
          • Colin, it would have been good to have an interim solution, but it does not seem reasonable to commit a patch, which fixes one bug, while introducing another.
            I traced back a series of jiras related to this problem. It looks like that multiple storages were not thoroughly thought through in the beginning and that people were trying to solve problems as they appear for a while. Feels like the time for the right fix.
          Show
          shv Konstantin Shvachko added a comment - In the short term, however, I would prefer the current patch, since it involves no RPC changes, and doesn't require all the DataNodes to be upgraded before it can work. I don't think my approach requires RPC change, since the block-report RPC message already has all required structures in place. It should require only the processing logic change. DataNodes will need to be upgraded indeed, but only in the case if they split its block-reports into multiple RPC, because full report lists all storages already. But even multi-RPC case it will only mean that zombie storages will not be removed until they are upgraded. Colin, it would have been good to have an interim solution, but it does not seem reasonable to commit a patch, which fixes one bug, while introducing another. I traced back a series of jiras related to this problem. It looks like that multiple storages were not thoroughly thought through in the beginning and that people were trying to solve problems as they appear for a while. Feels like the time for the right fix.
          Hide
          shv Konstantin Shvachko added a comment -

          Daryn, I also proposed throwing an Exception if there is a block report already in progress as a interim fix.
          But in your patch you add current node to activeReports before processing each storage, and then remove it right after. So for the next storage activeReports will be empty, whether the reports are interleaving or not. And the exception is never thrown. LMK if I missed something.
          Also looks like your patch is against 2.7, not applying to trunk.

          Show
          shv Konstantin Shvachko added a comment - Daryn, I also proposed throwing an Exception if there is a block report already in progress as a interim fix. But in your patch you add current node to activeReports before processing each storage, and then remove it right after. So for the next storage activeReports will be empty, whether the reports are interleaving or not. And the exception is never thrown. LMK if I missed something. Also looks like your patch is against 2.7, not applying to trunk.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for looking at this, Daryn Sharp. I'm not sure about the approach you proposed, though. If interleaved full block reports really are very common for Konstantin Shvachko, it seems like throwing an exception when these are received would be problematic. It sounds like there might be some implementation concerns as well, although I didn't look at the patch.

          Konstantin Shvachko wrote: I don't think my approach requires RPC change, since the block-report RPC message already has all required structures in place. It should require only the processing logic change.

          Just to be clear. If what is being sent over the wire is changing, I would consider that an "RPC change." We can create an RPC change without modifying the .proto file-- for example, by choosing not to fill in some optional field, or filling in some other field.

          Colin, it would have been good to have an interim solution, but it does not seem reasonable to commit a patch, which fixes one bug, while introducing another.

          The patch doesn't introduce any bugs. It does mean that we won't remove zombie storages when interleaved block reports are received. But we are not handling this correctly right now either, so that is not a regression.

          Like I said earlier, I think your approach is a good one, but I think we should get in the patch I posted here. It is a very small and non-disruptive change which doesn't alter what is sent over the wire. It can easily be backported to stable branches. Why don't we commit this patch, and then work on a follow-on with the RPC change and simplification that you proposed?

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for looking at this, Daryn Sharp . I'm not sure about the approach you proposed, though. If interleaved full block reports really are very common for Konstantin Shvachko , it seems like throwing an exception when these are received would be problematic. It sounds like there might be some implementation concerns as well, although I didn't look at the patch. Konstantin Shvachko wrote: I don't think my approach requires RPC change, since the block-report RPC message already has all required structures in place. It should require only the processing logic change. Just to be clear. If what is being sent over the wire is changing, I would consider that an "RPC change." We can create an RPC change without modifying the .proto file-- for example, by choosing not to fill in some optional field, or filling in some other field. Colin, it would have been good to have an interim solution, but it does not seem reasonable to commit a patch, which fixes one bug, while introducing another. The patch doesn't introduce any bugs. It does mean that we won't remove zombie storages when interleaved block reports are received. But we are not handling this correctly right now either, so that is not a regression. Like I said earlier, I think your approach is a good one, but I think we should get in the patch I posted here. It is a very small and non-disruptive change which doesn't alter what is sent over the wire. It can easily be backported to stable branches. Why don't we commit this patch, and then work on a follow-on with the RPC change and simplification that you proposed?
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Assigning the ticket to myself so that I can upload a patch. Please review.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Assigning the ticket to myself so that I can upload a patch. Please review.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          I uploaded the patch HDFS-10301.004.patch. I have implemented the idea that Konstantin suggested, i.e, DNs explicitly report storages that they have. This eliminates NN guessing which storage is the last in the block report RPC. In the case of FBR, NameNodeRPCServer can retrieve the list of storages from the storage block report array. In the case that block reports are split, DNs send an additional StorageReportOnly RPC after sending the block reports for each individual storage. This StorageReportOnly RPC is sent as a FBR. This rpc contains all the storages that the DN has with -1 number of blocks. A new enum STORAGE_REPORT_ONLY is introduced in BlockListsAsLong for this purpose.

          Zombie storage removal is triggered from the NameNodeRPCServer instead of the BlockManager since the RPCServer now has all the information required to construct the list of storages that the DN is reporting. After processing the block reports as usual, zombie storages are removed by comparing the list of storages in the block report and the list of storages that the NN is aware of for that DN.

          Show
          redvine Vinitha Reddy Gankidi added a comment - I uploaded the patch HDFS-10301 .004.patch. I have implemented the idea that Konstantin suggested, i.e, DNs explicitly report storages that they have. This eliminates NN guessing which storage is the last in the block report RPC. In the case of FBR, NameNodeRPCServer can retrieve the list of storages from the storage block report array. In the case that block reports are split, DNs send an additional StorageReportOnly RPC after sending the block reports for each individual storage. This StorageReportOnly RPC is sent as a FBR. This rpc contains all the storages that the DN has with -1 number of blocks. A new enum STORAGE_REPORT_ONLY is introduced in BlockListsAsLong for this purpose. Zombie storage removal is triggered from the NameNodeRPCServer instead of the BlockManager since the RPCServer now has all the information required to construct the list of storages that the DN is reporting. After processing the block reports as usual, zombie storages are removed by comparing the list of storages in the block report and the list of storages that the NN is aware of for that DN.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 6m 22s trunk passed
          +1 compile 0m 42s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 38s trunk passed
          +1 javadoc 1m 4s trunk passed
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 39s the patch passed
          +1 javac 0m 39s the patch passed
          -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 25 new + 371 unchanged - 7 fixed = 396 total (was 378)
          +1 mvnsite 0m 47s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 45s the patch passed
          +1 javadoc 1m 1s the patch passed
          -1 unit 58m 26s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 23s Patch does not generate ASF License warnings.
          77m 11s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805798/HDFS-10301.004.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d79544c47ce6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b4078bd
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15531/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15531/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 22s trunk passed +1 compile 0m 42s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 38s trunk passed +1 javadoc 1m 4s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 39s the patch passed +1 javac 0m 39s the patch passed -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 25 new + 371 unchanged - 7 fixed = 396 total (was 378) +1 mvnsite 0m 47s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 1m 1s the patch passed -1 unit 58m 26s hadoop-hdfs in the patch failed. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 77m 11s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805798/HDFS-10301.004.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d79544c47ce6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b4078bd Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15531/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15531/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15531/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hi Vinitha Reddy Gankidi,

          Thanks for your interest in this. I wish I could get more people interested in this JIRA-- it has been hard to raise interest, unfortunately.

          Just to clarify, you don't need to assign a JIRA to yourself in order to post a patch or suggest a solution. In general, when someone is actively working on a patch, you should ask before reassigning their JIRAs to yourself.

          A whole separate RPC just for reporting the storages which are present seems excessive. It will add additional load to the namenode.

               if (node.leaseId == 0) {
          -      LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
          -               "is not in the pending set.",
          -               Long.toHexString(id), dn.getDatanodeUuid());
          -      return false;
          +      LOG.debug("DN {} is not in the pending set because BR with lease 0x{} was processed out of order",
          +          dn.getDatanodeUuid(), Long.toHexString(id));
          +      return true;
          

          The leaseId being 0 doesn't mean that the block report was processed out of order. If you manually trigger a block report with the hdfs dfsadmin -triggerBlockReport command, it will also have lease id 0. Legacy block reports will also have lease ID 0.

          In general, your solution doesn't fix the problem during upgrade and is a much bigger patch, which is why I think HDFS-10301.003.patch should be committed and the RPC changes should be done in a follow-on JIRA. I do not see us backporting RPC changes to all the stable branches.

          Show
          cmccabe Colin P. McCabe added a comment - Hi Vinitha Reddy Gankidi , Thanks for your interest in this. I wish I could get more people interested in this JIRA-- it has been hard to raise interest, unfortunately. Just to clarify, you don't need to assign a JIRA to yourself in order to post a patch or suggest a solution. In general, when someone is actively working on a patch, you should ask before reassigning their JIRAs to yourself. A whole separate RPC just for reporting the storages which are present seems excessive. It will add additional load to the namenode. if (node.leaseId == 0) { - LOG.warn( "BR lease 0x{} is not valid for DN {}, because the DN " + - "is not in the pending set." , - Long .toHexString(id), dn.getDatanodeUuid()); - return false ; + LOG.debug( "DN {} is not in the pending set because BR with lease 0x{} was processed out of order" , + dn.getDatanodeUuid(), Long .toHexString(id)); + return true ; The leaseId being 0 doesn't mean that the block report was processed out of order. If you manually trigger a block report with the hdfs dfsadmin -triggerBlockReport command, it will also have lease id 0. Legacy block reports will also have lease ID 0. In general, your solution doesn't fix the problem during upgrade and is a much bigger patch, which is why I think HDFS-10301 .003.patch should be committed and the RPC changes should be done in a follow-on JIRA. I do not see us backporting RPC changes to all the stable branches.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Rebasing patch 003 on trunk.

          Show
          cmccabe Colin P. McCabe added a comment - Rebasing patch 003 on trunk.
          Hide
          zhz Zhe Zhang added a comment -

          Colin P. McCabe Just a quick note that it's a new JIRA rule that you have to be either the assignee or a committer to attach a patch.

          Show
          zhz Zhe Zhang added a comment - Colin P. McCabe Just a quick note that it's a new JIRA rule that you have to be either the assignee or a committer to attach a patch.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Oh, sorry! I didn't realize we had added a new rule about attaching patches.

          Show
          cmccabe Colin P. McCabe added a comment - Oh, sorry! I didn't realize we had added a new rule about attaching patches.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 47s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 0m 49s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 1m 37s trunk passed
          +1 javadoc 1m 6s trunk passed
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 40s the patch passed
          +1 javac 0m 40s the patch passed
          -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 293 unchanged - 0 fixed = 299 total (was 293)
          +1 mvnsite 0m 47s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 43s the patch passed
          +1 javadoc 1m 2s the patch passed
          -1 unit 76m 2s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          95m 7s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.TestSafeMode



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 26dc17f5173c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 57c31a3
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15548/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15548/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 47s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 49s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 1m 37s trunk passed +1 javadoc 1m 6s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 40s the patch passed +1 javac 0m 40s the patch passed -1 checkstyle 0m 26s hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 293 unchanged - 0 fixed = 299 total (was 293) +1 mvnsite 0m 47s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 43s the patch passed +1 javadoc 1m 2s the patch passed -1 unit 76m 2s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 95m 7s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.TestSafeMode Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 26dc17f5173c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 57c31a3 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15548/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15548/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15548/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Thanks for your review Colin P. McCabe. By legacy reports do you mean block reports from DNs before the concept of leases was introduced for block reports?

          public synchronized boolean checkLease(DatanodeDescriptor dn,
                                                   long monotonicNowMs, long id) {
              if (id == 0) {
                LOG.debug("Datanode {} is using BR lease id 0x0 to bypass " +
                    "rate-limiting.", dn.getDatanodeUuid());
                return true;
              }
              NodeData node = nodes.get(dn.getDatanodeUuid());
              if (node == null) {
                LOG.info("BR lease 0x{} is not valid for unknown datanode {}",
                    Long.toHexString(id), dn.getDatanodeUuid());
                return false;
              }
              if (node.leaseId == 0) {
                LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
                         "is not in the pending set.",
                         Long.toHexString(id), dn.getDatanodeUuid());
                return false;
              }
          

          Isn't id equal to 0 for legacy block reports and when block reports are manually triggered? My understanding is that node.leaseId is set to zero only when the lease is removed. In my patch, the lease is removed by looking at the current rpc index in the block report context.

          if (context != null) {
                  if (context.getTotalRpcs() == context.getCurRpc() + 1) {
                    long leaseId = this.getBlockReportLeaseManager().removeLease(node);
                    BlockManagerFaultInjector.getInstance().removeBlockReportLease(node, leaseId);
                  }
          

          When processing of storage report happens out of order, we may set node.leaseId=0 before all DN storage reports are processed. Therefore, we log a message and continue to process the storage report even if node.leaseId=0. Please let me know if you see any issue with this approach.

          During upgrades, we do not remove zombie storages. Once the upgrade is finalized, we go ahead and remove the zombie storages.

          if (nn.getFSImage().isUpgradeFinalized() && noStaleStorages) {
                Set<String> storageIDsInBlockReport = new HashSet<>();
                if (context.getTotalRpcs() == 1) {
                  for (StorageBlockReport report : reports) {
                    storageIDsInBlockReport.add(report.getStorage().getStorageID());
                  }
                  bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport);
                }
              }
          

          Can you please elaborate on what you meant by "In general, your solution doesn't fix the problem during upgrade". What problems do you foresee?

          I am currently investigating why the test TestAddOverReplicatedStripedBlocks failed.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Thanks for your review Colin P. McCabe . By legacy reports do you mean block reports from DNs before the concept of leases was introduced for block reports? public synchronized boolean checkLease(DatanodeDescriptor dn, long monotonicNowMs, long id) { if (id == 0) { LOG.debug( "Datanode {} is using BR lease id 0x0 to bypass " + "rate-limiting." , dn.getDatanodeUuid()); return true ; } NodeData node = nodes.get(dn.getDatanodeUuid()); if (node == null ) { LOG.info( "BR lease 0x{} is not valid for unknown datanode {}" , Long .toHexString(id), dn.getDatanodeUuid()); return false ; } if (node.leaseId == 0) { LOG.warn( "BR lease 0x{} is not valid for DN {}, because the DN " + "is not in the pending set." , Long .toHexString(id), dn.getDatanodeUuid()); return false ; } Isn't id equal to 0 for legacy block reports and when block reports are manually triggered? My understanding is that node.leaseId is set to zero only when the lease is removed. In my patch, the lease is removed by looking at the current rpc index in the block report context. if (context != null ) { if (context.getTotalRpcs() == context.getCurRpc() + 1) { long leaseId = this .getBlockReportLeaseManager().removeLease(node); BlockManagerFaultInjector.getInstance().removeBlockReportLease(node, leaseId); } When processing of storage report happens out of order, we may set node.leaseId=0 before all DN storage reports are processed. Therefore, we log a message and continue to process the storage report even if node.leaseId=0 . Please let me know if you see any issue with this approach. During upgrades, we do not remove zombie storages. Once the upgrade is finalized, we go ahead and remove the zombie storages. if (nn.getFSImage().isUpgradeFinalized() && noStaleStorages) { Set< String > storageIDsInBlockReport = new HashSet<>(); if (context.getTotalRpcs() == 1) { for (StorageBlockReport report : reports) { storageIDsInBlockReport.add(report.getStorage().getStorageID()); } bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport); } } Can you please elaborate on what you meant by "In general, your solution doesn't fix the problem during upgrade". What problems do you foresee? I am currently investigating why the test TestAddOverReplicatedStripedBlocks failed.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Colin, let's decide on the way to move forward. I do not see a point in making this change in two steps.

          • Your changes will essentially be completely removed by Vinitha's patch.
          • I do not see her patch introducing incompatible changes. So it can and should be backported through to branch 2.6.

          A thorough review is needed and will be quite helpful. I think the 004 patch covers

          • the upgrade case, that is, it works consistently for both old (pre-patch) and new (patched) DataNodes block reports
          • the case when the entire block report is sent in a single RPC and
          • the case when block reports are split into multiple RPCs
          • the leases

          So apart from the failed test I do not see any issues. It would be good if you could take a fresh look, see if any corner cases were missed.

          Show
          shv Konstantin Shvachko added a comment - Hey Colin, let's decide on the way to move forward. I do not see a point in making this change in two steps. Your changes will essentially be completely removed by Vinitha's patch. I do not see her patch introducing incompatible changes. So it can and should be backported through to branch 2.6. A thorough review is needed and will be quite helpful. I think the 004 patch covers the upgrade case, that is, it works consistently for both old (pre-patch) and new (patched) DataNodes block reports the case when the entire block report is sent in a single RPC and the case when block reports are split into multiple RPCs the leases So apart from the failed test I do not see any issues. It would be good if you could take a fresh look, see if any corner cases were missed.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I never said that patch 004 introduced incompatible changes. I just argued that it was a bigger change than was necessary to fix the problem. All other things being equal, we would prefer a smaller change to a bigger one. The only argument you have given against my change is that it doesn't fix the problem in the case where full block reports are interleaved. But this is an extremely, extremely rare case, to the point where nobody else has even seen this problem in their cluster.

          I still think that patch 005 is an easier way to fix the problem. It's basically a simple bugfix to my original patch. However, if you want to do something more complex, I will review it. But I don't want to add any additional RPCs. We already have problems with NameNode performance and we should not be adding more RPCs when it's not needed. We can include the storage information in the first RPC of the block report as an optional field.

          Show
          cmccabe Colin P. McCabe added a comment - I never said that patch 004 introduced incompatible changes. I just argued that it was a bigger change than was necessary to fix the problem. All other things being equal, we would prefer a smaller change to a bigger one. The only argument you have given against my change is that it doesn't fix the problem in the case where full block reports are interleaved. But this is an extremely, extremely rare case, to the point where nobody else has even seen this problem in their cluster. I still think that patch 005 is an easier way to fix the problem. It's basically a simple bugfix to my original patch. However, if you want to do something more complex, I will review it. But I don't want to add any additional RPCs. We already have problems with NameNode performance and we should not be adding more RPCs when it's not needed. We can include the storage information in the first RPC of the block report as an optional field.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          I looked into why the test TestAddOverReplicatedStripedBlocks fails with patch 004. I don't completely understand why the test relies on the fact that zombie storages should be removed when the DN has stale storages. Probably the test needs to be modified. Here are my findings:

          With the patch, the test fails with the following error:

          java.lang.AssertionError: expected:<10> but was:<11>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:555)
          	at org.junit.Assert.assertEquals(Assert.java:542)
          	at org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks.testProcessOverReplicatedAndMissingStripedBlock(TestAddOverReplicatedStripedBlocks.java:281)
          

          In the test, DFSUtil.createStripedFile is invoked in the beginning.

           /**
             * Creates the metadata of a file in striped layout. This method only
             * manipulates the NameNode state without injecting data to DataNode.
             * You should disable periodical heartbeat before use this.
             *  @param file Path of the file to create
             * @param dir Parent path of the file
             * @param numBlocks Number of striped block groups to add to the file
             * @param numStripesPerBlk Number of striped cells in each block
             * @param toMkdir
             */
            public static void createStripedFile(MiniDFSCluster cluster, Path file, Path dir,
                int numBlocks, int numStripesPerBlk, boolean toMkdir) throws Exception {
          

          This internally calls the DFSUtil.addBlockToFile method that mimics block reports. While processing these incremental storages, we update the datanode storages. In the test output, you can see the storages being added.

          2016-05-26 17:10:03,330 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 9505a2ad-78f4-45d7-9c13-2ecd92a06866 for DN 127.0.0.1:60835
          2016-05-26 17:10:03,331 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID d4bb2f70-4a1e-451f-9d47-a2967f819130 for DN 127.0.0.1:60839
          2016-05-26 17:10:03,332 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 841fc92f-fa15-4ced-8487-96ca4e6996d0 for DN 127.0.0.1:60844
          2016-05-26 17:10:03,332 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 304aaeeb-e2d0-4427-81c6-c79e4d0b6a4e for DN 127.0.0.1:60849
          2016-05-26 17:10:03,332 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 2d046d66-26fc-448f-938c-04dda2ecf34a for DN 127.0.0.1:60853
          2016-05-26 17:10:03,333 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 381d3151-e75e-434a-86f8-da5c83f22b19 for DN 127.0.0.1:60857
          2016-05-26 17:10:03,333 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 71f72bc9-9c66-478f-a0d7-3f0c7fc23964 for DN 127.0.0.1:60861
          2016-05-26 17:10:03,333 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 4dc539f3-b7a9-4145-a313-fa99ca1dd779 for DN 127.0.0.1:60865
          2016-05-26 17:10:03,333 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 734ea366-e635-4715-97d5-196bfcdccb18 for DN 127.0.0.1:60869
          2016-05-26 17:10:03,334 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID c639de06-e85c-4e93-92d2-506a49d4e41c for DN 127.0.0.1:60835
          2016-05-26 17:10:03,343 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID a82ff231-d630-4799-907d-f0a72ff06b38 for DN 127.0.0.1:60839
          2016-05-26 17:10:03,343 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 328c3467-0507-45fd-9aac-73a38165f741 for DN 127.0.0.1:60844
          2016-05-26 17:10:03,343 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 0b2a3b7f-e065-4e9a-9908-024091393738 for DN 127.0.0.1:60849
          2016-05-26 17:10:03,344 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 3654a0ce-8389-40bf-b8d3-08cc49895a7d for DN 127.0.0.1:60853
          2016-05-26 17:10:03,344 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 8072cc31-5567-4c04-8f71-7a8ee03c2fe0 for DN 127.0.0.1:60857
          2016-05-26 17:10:03,344 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 0202860d-4aad-4996-a325-23a34f052cb2 for DN 127.0.0.1:60861
          2016-05-26 17:10:03,345 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 5415d95d-c173-4458-be78-d3fa95652589 for DN 127.0.0.1:60865
          2016-05-26 17:10:03,345 [Thread-0] INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 14570c81-1dc1-4479-a65a-5b61944d4b94 for DN 127.0.0.1:60869
          2016-05-26 17:10:03,359 [IPC Server handler 9 on 60834] INFO  hdfs.StateChange (FSNamesystem.java:completeFile(2663)) - DIR* completeFile: /striped/file is closed by DFSClient_NONMAPREDUCE_865500748_10
          

          When these storages are added, the lastBlockReportId is set to zero and the storage is considered as a stale storage. Since the DN doesn't know about these storages, these storages are not reported in the next block report. These storages are considered as zombie storages and are removed. One of these zombie storages has a replica. Relevant logs:

          2016-05-26 17:10:03,383 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0x6aedc669a6437553: removing zombie storage c639de06-e85c-4e93-92d2-506a49d4e41c, which no longer exists on the DataNode.
          2016-05-26 17:10:03,384 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0x6aedc669a6437553: removed 0 replicas from storage c639de06-e85c-4e93-92d2-506a49d4e41c, which no longer exists on the DataNode.
          
          2016-05-26 17:10:03,416 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0xf7e24bf2690ca946: removing zombie storage 0202860d-4aad-4996-a325-23a34f052cb2, which no longer exists on the DataNode.
          2016-05-26 17:10:03,416 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0xf7e24bf2690ca946: removed 0 replicas from storage 0202860d-4aad-4996-a325-23a34f052cb2, which no longer exists on the DataNode.
          
          2016-05-26 17:10:04,217 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0xe361b2d0f2b49c0c: removing zombie storage 14570c81-1dc1-4479-a65a-5b61944d4b94, which no longer exists on the DataNode.
          2016-05-26 17:10:04,219 [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0xe361b2d0f2b49c0c: removed 1 replicas from storage 14570c81-1dc1-4479-a65a-5b61944d4b94, which no longer exists on the DataNode.
          

          In patch 004, zombie storages are not removed when there are stale storages. Are there scenarios where this could happen? Since the zombie storages are not removed and one of the zombie storages has a replica, the assertion fails. This test was introduced in HDFS-8827.

          Show
          redvine Vinitha Reddy Gankidi added a comment - I looked into why the test TestAddOverReplicatedStripedBlocks fails with patch 004. I don't completely understand why the test relies on the fact that zombie storages should be removed when the DN has stale storages. Probably the test needs to be modified. Here are my findings: With the patch, the test fails with the following error: java.lang.AssertionError: expected:<10> but was:<11> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks.testProcessOverReplicatedAndMissingStripedBlock(TestAddOverReplicatedStripedBlocks.java:281) In the test, DFSUtil.createStripedFile is invoked in the beginning. /** * Creates the metadata of a file in striped layout. This method only * manipulates the NameNode state without injecting data to DataNode. * You should disable periodical heartbeat before use this . * @param file Path of the file to create * @param dir Parent path of the file * @param numBlocks Number of striped block groups to add to the file * @param numStripesPerBlk Number of striped cells in each block * @param toMkdir */ public static void createStripedFile(MiniDFSCluster cluster, Path file, Path dir, int numBlocks, int numStripesPerBlk, boolean toMkdir) throws Exception { This internally calls the DFSUtil.addBlockToFile method that mimics block reports. While processing these incremental storages, we update the datanode storages. In the test output, you can see the storages being added. 2016-05-26 17:10:03,330 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 9505a2ad-78f4-45d7-9c13-2ecd92a06866 for DN 127.0.0.1:60835 2016-05-26 17:10:03,331 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID d4bb2f70-4a1e-451f-9d47-a2967f819130 for DN 127.0.0.1:60839 2016-05-26 17:10:03,332 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 841fc92f-fa15-4ced-8487-96ca4e6996d0 for DN 127.0.0.1:60844 2016-05-26 17:10:03,332 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 304aaeeb-e2d0-4427-81c6-c79e4d0b6a4e for DN 127.0.0.1:60849 2016-05-26 17:10:03,332 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 2d046d66-26fc-448f-938c-04dda2ecf34a for DN 127.0.0.1:60853 2016-05-26 17:10:03,333 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 381d3151-e75e-434a-86f8-da5c83f22b19 for DN 127.0.0.1:60857 2016-05-26 17:10:03,333 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 71f72bc9-9c66-478f-a0d7-3f0c7fc23964 for DN 127.0.0.1:60861 2016-05-26 17:10:03,333 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 4dc539f3-b7a9-4145-a313-fa99ca1dd779 for DN 127.0.0.1:60865 2016-05-26 17:10:03,333 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 734ea366-e635-4715-97d5-196bfcdccb18 for DN 127.0.0.1:60869 2016-05-26 17:10:03,334 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID c639de06-e85c-4e93-92d2-506a49d4e41c for DN 127.0.0.1:60835 2016-05-26 17:10:03,343 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID a82ff231-d630-4799-907d-f0a72ff06b38 for DN 127.0.0.1:60839 2016-05-26 17:10:03,343 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 328c3467-0507-45fd-9aac-73a38165f741 for DN 127.0.0.1:60844 2016-05-26 17:10:03,343 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 0b2a3b7f-e065-4e9a-9908-024091393738 for DN 127.0.0.1:60849 2016-05-26 17:10:03,344 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 3654a0ce-8389-40bf-b8d3-08cc49895a7d for DN 127.0.0.1:60853 2016-05-26 17:10:03,344 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 8072cc31-5567-4c04-8f71-7a8ee03c2fe0 for DN 127.0.0.1:60857 2016-05-26 17:10:03,344 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 0202860d-4aad-4996-a325-23a34f052cb2 for DN 127.0.0.1:60861 2016-05-26 17:10:03,345 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 5415d95d-c173-4458-be78-d3fa95652589 for DN 127.0.0.1:60865 2016-05-26 17:10:03,345 [ Thread -0] INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(912)) - Adding new storage ID 14570c81-1dc1-4479-a65a-5b61944d4b94 for DN 127.0.0.1:60869 2016-05-26 17:10:03,359 [IPC Server handler 9 on 60834] INFO hdfs.StateChange (FSNamesystem.java:completeFile(2663)) - DIR* completeFile: /striped/file is closed by DFSClient_NONMAPREDUCE_865500748_10 When these storages are added, the lastBlockReportId is set to zero and the storage is considered as a stale storage. Since the DN doesn't know about these storages, these storages are not reported in the next block report. These storages are considered as zombie storages and are removed. One of these zombie storages has a replica. Relevant logs: 2016-05-26 17:10:03,383 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0x6aedc669a6437553: removing zombie storage c639de06-e85c-4e93-92d2-506a49d4e41c, which no longer exists on the DataNode. 2016-05-26 17:10:03,384 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0x6aedc669a6437553: removed 0 replicas from storage c639de06-e85c-4e93-92d2-506a49d4e41c, which no longer exists on the DataNode. 2016-05-26 17:10:03,416 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0xf7e24bf2690ca946: removing zombie storage 0202860d-4aad-4996-a325-23a34f052cb2, which no longer exists on the DataNode. 2016-05-26 17:10:03,416 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0xf7e24bf2690ca946: removed 0 replicas from storage 0202860d-4aad-4996-a325-23a34f052cb2, which no longer exists on the DataNode. 2016-05-26 17:10:04,217 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2239)) - processReport 0xe361b2d0f2b49c0c: removing zombie storage 14570c81-1dc1-4479-a65a-5b61944d4b94, which no longer exists on the DataNode. 2016-05-26 17:10:04,219 [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2263)) - processReport 0xe361b2d0f2b49c0c: removed 1 replicas from storage 14570c81-1dc1-4479-a65a-5b61944d4b94, which no longer exists on the DataNode. In patch 004, zombie storages are not removed when there are stale storages. Are there scenarios where this could happen? Since the zombie storages are not removed and one of the zombie storages has a replica, the assertion fails. This test was introduced in HDFS-8827 .
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          If we do have the check for stale storages before zombie storage removal, noStaleStorages in NameNodeRpcServer should be set to true when isStorageReport is true.

          Show
          redvine Vinitha Reddy Gankidi added a comment - If we do have the check for stale storages before zombie storage removal, noStaleStorages in NameNodeRpcServer should be set to true when isStorageReport is true.
          Hide
          daryn Daryn Sharp added a comment -

          Catching up, in my sample patch I think the logic should have been in fsn#blockReport instead of bm#processReport. I had hastily created it for example.

          Now consider that we are really trying to bandaid an edge case caused by multi-storage reports. I think we should also change the DN to going forward always send per-storage reports. We had to switch to per-storage reports a long time ago to avoid all kinds of issues, particularly at startup. Ex. callq overflows, extremely high gc rate/time, full GCs, etc.

          Show
          daryn Daryn Sharp added a comment - Catching up, in my sample patch I think the logic should have been in fsn#blockReport instead of bm#processReport . I had hastily created it for example. Now consider that we are really trying to bandaid an edge case caused by multi-storage reports. I think we should also change the DN to going forward always send per-storage reports. We had to switch to per-storage reports a long time ago to avoid all kinds of issues, particularly at startup. Ex. callq overflows, extremely high gc rate/time, full GCs, etc.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Vinitha Reddy Gankidi, the fact that you are having trouble with stale storages versus zombie storages is because your patch uses a separate mechanism to detect what storages exist on the DN. The existing code doesn't have this problem because the full block report itself acted as the record of what storages existed. This is one negative side effect of the more complex approach. Another negative side effect is that you are transmitting the same information about which storages are present multiple times.

          Despite these negatives, I'm still willing to review a patch that uses the more complicated method as long as you don't introduce extra RPCs. I agree that we should remove a stale storage if it doesn't appear in the full listing that gets sent. Just to be clear, I am -1 on a patch which adds extra RPCs. Perhaps you can send this listing in an optional field in the first RPC.

          Daryn Sharp, I don't like the idea of "band-aiding" this issue rather than fixing it at the root. Throwing an exception on interleaved storage reports, or forbidding combined storage reports, seem like very brittle work-arounds that could easily be undone by someone making follow-on changes. I came up with patch 005 and the earlier patches as a very simple fix that could easily be backported. If you are interested in something simple, then please check it out... or at least give a reason for not checking it out.

          Show
          cmccabe Colin P. McCabe added a comment - Vinitha Reddy Gankidi , the fact that you are having trouble with stale storages versus zombie storages is because your patch uses a separate mechanism to detect what storages exist on the DN. The existing code doesn't have this problem because the full block report itself acted as the record of what storages existed. This is one negative side effect of the more complex approach. Another negative side effect is that you are transmitting the same information about which storages are present multiple times. Despite these negatives, I'm still willing to review a patch that uses the more complicated method as long as you don't introduce extra RPCs. I agree that we should remove a stale storage if it doesn't appear in the full listing that gets sent. Just to be clear, I am -1 on a patch which adds extra RPCs. Perhaps you can send this listing in an optional field in the first RPC. Daryn Sharp , I don't like the idea of "band-aiding" this issue rather than fixing it at the root. Throwing an exception on interleaved storage reports, or forbidding combined storage reports, seem like very brittle work-arounds that could easily be undone by someone making follow-on changes. I came up with patch 005 and the earlier patches as a very simple fix that could easily be backported. If you are interested in something simple, then please check it out... or at least give a reason for not checking it out.
          Hide
          shv Konstantin Shvachko added a comment -

          Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs. If you choose to send the entire block report in one RPC, then it will be a single RPC call with her patch as well. It seems logical to have an extra RPC because you have chosen to split block reports into multiple RPCs. We are well aware of the NameNode performance problems.
          Could you please review the patch.

          Show
          shv Konstantin Shvachko added a comment - Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs. If you choose to send the entire block report in one RPC, then it will be a single RPC call with her patch as well. It seems logical to have an extra RPC because you have chosen to split block reports into multiple RPCs. We are well aware of the NameNode performance problems. Could you please review the patch.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs.

          The case where block reports are sent in multiple RPCs is exactly the case where scalability is the most important, since it indicates that we have a large number of blocks. My patch adds no new RPCs. If we are going to take an alternate approach, it should not involve a performance regression.

          Could you please review the patch.

          I did review the patch. I suggested adding an optional field in an existing RPC rather than adding a new RPC, and stated that I was -1 on adding new RPC load to the NN.

          Show
          cmccabe Colin P. McCabe added a comment - Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs. The case where block reports are sent in multiple RPCs is exactly the case where scalability is the most important, since it indicates that we have a large number of blocks. My patch adds no new RPCs. If we are going to take an alternate approach, it should not involve a performance regression. Could you please review the patch. I did review the patch. I suggested adding an optional field in an existing RPC rather than adding a new RPC, and stated that I was -1 on adding new RPC load to the NN.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks for the discussions Colin P. McCabe, Konstantin Shvachko,Vinitha Reddy Gankidi

          I think the challenge here is that different deployments have different levels of 1) BR split; 2) BRs interleaving; 3) zombie storages. E.g. BR split might be completely turned off in configuration, and BR interleaving heavily depends on how busy the NN is.

          a) Patch v5 (from Colin) works well when BRs rarely interleave. In the worst case, a zombie storage could remain on NN for several full-BR cycles.
          b) Patch v4 (from Vinitha) works well when BRs are rarely split (or split into many RPCs). The worst case is where each BR is split into a small number of RPCs – if each full BR is split into n RPCs, the relative overhead is 1 / n, in terms of # of RPCs.
          c) As Colin suggested, we can also extend first / last RPC in a full BR with the list of storages. By doing that we are adding overhead to every BR RPC (it needs to mark whether it has the list). Theoretically, the worst-case-overhead is to add this to an empty BR.

          So overall, I think c) is the best long term solution, because its worst case scenario is the least likely to happen, and the consequence is the most tolerable. It is more complex than b) though. Given the size of the v4 patch, are we OK to go with b) (v4 patch) first and do c) as a follow-on?

          Show
          zhz Zhe Zhang added a comment - Thanks for the discussions Colin P. McCabe , Konstantin Shvachko , Vinitha Reddy Gankidi I think the challenge here is that different deployments have different levels of 1) BR split; 2) BRs interleaving; 3) zombie storages. E.g. BR split might be completely turned off in configuration, and BR interleaving heavily depends on how busy the NN is. a) Patch v5 (from Colin) works well when BRs rarely interleave. In the worst case, a zombie storage could remain on NN for several full-BR cycles. b) Patch v4 (from Vinitha) works well when BRs are rarely split (or split into many RPCs). The worst case is where each BR is split into a small number of RPCs – if each full BR is split into n RPCs, the relative overhead is 1 / n, in terms of # of RPCs. c) As Colin suggested, we can also extend first / last RPC in a full BR with the list of storages. By doing that we are adding overhead to every BR RPC (it needs to mark whether it has the list). Theoretically, the worst-case-overhead is to add this to an empty BR. So overall, I think c) is the best long term solution, because its worst case scenario is the least likely to happen, and the consequence is the most tolerable. It is more complex than b) though. Given the size of the v4 patch, are we OK to go with b) (v4 patch) first and do c) as a follow-on?
          Hide
          cmccabe Colin P. McCabe added a comment -

          Sorry for the slow reply. I was on vacation.

          Like I said earlier, I am -1 on patch v4 because adding new RPCs is bad for NN scalability. I also think it's a much larger patch than needed. It doesn't make sense as an interim solution.

          Why don't we commit v5 and discuss improvements in a follow-on JIRA? So far there is no concrete argument against it other than the fact that it doesn't remove zombie storages in the case where BRs are interleaved. But we already know that BR interleaving is an extremely rare corner case-- otherwise you can bet that this JIRA would have attracted a lot more attention.

          Show
          cmccabe Colin P. McCabe added a comment - Sorry for the slow reply. I was on vacation. Like I said earlier, I am -1 on patch v4 because adding new RPCs is bad for NN scalability. I also think it's a much larger patch than needed. It doesn't make sense as an interim solution. Why don't we commit v5 and discuss improvements in a follow-on JIRA? So far there is no concrete argument against it other than the fact that it doesn't remove zombie storages in the case where BRs are interleaved. But we already know that BR interleaving is an extremely rare corner case-- otherwise you can bet that this JIRA would have attracted a lot more attention.
          Hide
          shv Konstantin Shvachko added a comment -

          Sounds like you were on -1 spree lately, Colin P. McCabe. Hope you are alright.

          Here is why I think we should not commit your patch.

          1. The whole approach of keeping the state for the block report processing on the NameNode is error-prone. It assumes at-once execution, and therefore when block reports interleave the BR-state gets messed up. Particularly, the BitSet used to mark storages, which have been processed, can be reset during interleaving multiple times and cannot be used to count storages in the report. In current implementation the messing-up of BR-state leads to false positive detection of a zombie storage and removal of a perfectly valid one.
          2. Your patch leaves the messing-up of the BR-state in place (the BitSet is still inconsistent). It only tweaks it to avoid the false-positive. It still allows false-negatives, which lead to not detecting a zombie when it actually is present.
          3. So the correct solution for the problem is to remove the BR-state altogether, which is achieved in Vinita's patch. And if we have a better solution why settle on a temporary work-around. It may be a bigger change, but only because it removes the invalid logic related to the BR-state.

          It seems that you don't or don't want to understand reasoning around adding separate storage reporting RPC call. At least you addressed it only by repeating your -1. For the third time. And did not respond to Zhe Zhang's proposal to merge the storage reporting RPC into one of the storage reports in the next jira.
          Given that and in order to move forward, we should look into making changes to the last BR RPC call, which should now also report all storages.

          Show
          shv Konstantin Shvachko added a comment - Sounds like you were on -1 spree lately, Colin P. McCabe . Hope you are alright. Here is why I think we should not commit your patch. The whole approach of keeping the state for the block report processing on the NameNode is error-prone. It assumes at-once execution, and therefore when block reports interleave the BR-state gets messed up . Particularly, the BitSet used to mark storages, which have been processed, can be reset during interleaving multiple times and cannot be used to count storages in the report. In current implementation the messing-up of BR-state leads to false positive detection of a zombie storage and removal of a perfectly valid one. Your patch leaves the messing-up of the BR-state in place (the BitSet is still inconsistent). It only tweaks it to avoid the false-positive. It still allows false-negatives, which lead to not detecting a zombie when it actually is present. So the correct solution for the problem is to remove the BR-state altogether, which is achieved in Vinita's patch. And if we have a better solution why settle on a temporary work-around. It may be a bigger change, but only because it removes the invalid logic related to the BR-state. It seems that you don't or don't want to understand reasoning around adding separate storage reporting RPC call. At least you addressed it only by repeating your -1. For the third time. And did not respond to Zhe Zhang 's proposal to merge the storage reporting RPC into one of the storage reports in the next jira. Given that and in order to move forward, we should look into making changes to the last BR RPC call, which should now also report all storages.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Konstantin Shvachko, comments about me "being on a -1 spree" are not constructive and they don't do anything to help the tone of the discussion. We've been talking about this since April and my views have been consistent the whole time. I have a solution, but I am open to other solutions as long as they don't have big disadvantages.

          The whole approach of keeping the state for the block report processing on the NameNode is error-prone. It assumes at-once execution, and therefore when block reports interleave the BR-state gets messed up. Particularly, the BitSet used to mark storages, which have been processed, can be reset during interleaving multiple times and cannot be used to count storages in the report. In current implementation the messing-up of BR-state leads to false positive detection of a zombie storage and removal of a perfectly valid one.

          Block report processing is inherently about state. It is inherently stateful. It is a mechanism for the DN to synchronize its entire block state with the block state on the NN. Interleaved block reports are very bad news, even if this bug didn't exist, because they mean that the state on the NN will go "back in time" for some storages, rather than monotonically moving forward in time. This may lead the NN to make incorrect (and potentially irreversible) decisions like deleting a replica somewhere because it appears to exist on an old stale interleaved block report. Keep in mind that these old stale interleaved FBRs will override any incremental BRs that were sent in the meantime!

          Interleaved block reports also potentially indicate that the DNs are sending new full block reports before the last ones have been processed. So either our FBR retransmission mechanism is screwed up and is spewing a firehose of FBRs at an unresponsive NameNode (making it even more unresponsive, no doubt), or the NN can't process an FBR in the extremely long FBR sending period. Both of these explanations mean that you've got a cluster which has serious, serious problems and you should fix it right now.

          That's the reason why people are not taking this JIRA as seriously as they otherwise might-- because they know that interleaved FBRs mean that something is very wrong. And you are consistently ignoring this feedback and telling us how my patch is bad because it doesn't perform zombie storage elimination when FBRs get interleaved.

          It seems that you don't or don't want to understand reasoning around adding separate storage reporting RPC call. At least you addressed it only by repeating your -1. For the third time. And did not respond to Zhe Zhang's proposal to merge the storage reporting RPC into one of the storage reports in the next jira. Given that and in order to move forward, we should look into making changes to the last BR RPC call, which should now also report all storages.

          I am fine with adding storage reporting to any of the existing FBR RPCs. What I am not fine with is adding another RPC which will create more load.

          Show
          cmccabe Colin P. McCabe added a comment - Konstantin Shvachko , comments about me "being on a -1 spree" are not constructive and they don't do anything to help the tone of the discussion. We've been talking about this since April and my views have been consistent the whole time. I have a solution, but I am open to other solutions as long as they don't have big disadvantages. The whole approach of keeping the state for the block report processing on the NameNode is error-prone. It assumes at-once execution, and therefore when block reports interleave the BR-state gets messed up. Particularly, the BitSet used to mark storages, which have been processed, can be reset during interleaving multiple times and cannot be used to count storages in the report. In current implementation the messing-up of BR-state leads to false positive detection of a zombie storage and removal of a perfectly valid one. Block report processing is inherently about state. It is inherently stateful. It is a mechanism for the DN to synchronize its entire block state with the block state on the NN. Interleaved block reports are very bad news, even if this bug didn't exist, because they mean that the state on the NN will go "back in time" for some storages, rather than monotonically moving forward in time. This may lead the NN to make incorrect (and potentially irreversible) decisions like deleting a replica somewhere because it appears to exist on an old stale interleaved block report. Keep in mind that these old stale interleaved FBRs will override any incremental BRs that were sent in the meantime! Interleaved block reports also potentially indicate that the DNs are sending new full block reports before the last ones have been processed. So either our FBR retransmission mechanism is screwed up and is spewing a firehose of FBRs at an unresponsive NameNode (making it even more unresponsive, no doubt), or the NN can't process an FBR in the extremely long FBR sending period. Both of these explanations mean that you've got a cluster which has serious, serious problems and you should fix it right now. That's the reason why people are not taking this JIRA as seriously as they otherwise might-- because they know that interleaved FBRs mean that something is very wrong. And you are consistently ignoring this feedback and telling us how my patch is bad because it doesn't perform zombie storage elimination when FBRs get interleaved. It seems that you don't or don't want to understand reasoning around adding separate storage reporting RPC call. At least you addressed it only by repeating your -1. For the third time. And did not respond to Zhe Zhang's proposal to merge the storage reporting RPC into one of the storage reports in the next jira. Given that and in order to move forward, we should look into making changes to the last BR RPC call, which should now also report all storages. I am fine with adding storage reporting to any of the existing FBR RPCs. What I am not fine with is adding another RPC which will create more load.
          Hide
          shv Konstantin Shvachko added a comment -

          Colin, you seem to imply that I ignored some of your questions. I don't see which. Could you please formulate your question so that I could answer it, if you have any.

          Show
          shv Konstantin Shvachko added a comment - Colin, you seem to imply that I ignored some of your questions. I don't see which. Could you please formulate your question so that I could answer it, if you have any.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          I uploaded another patch (006) that is similar to 005 but doesn't add any new RPCs. Please review it.
          In the case that block reports are split, information about other storages in the DN is sent along with the last storage BR RPC. TestAddOverReplicatedStripedBlocks test passes with this patch since zombie storages are removed even if there are stale storages.

          Show
          redvine Vinitha Reddy Gankidi added a comment - I uploaded another patch (006) that is similar to 005 but doesn't add any new RPCs. Please review it. In the case that block reports are split, information about other storages in the DN is sent along with the last storage BR RPC. TestAddOverReplicatedStripedBlocks test passes with this patch since zombie storages are removed even if there are stale storages.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 32s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 24s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 1s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 51s trunk passed
          +1 javadoc 1m 2s trunk passed
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 368 unchanged - 12 fixed = 372 total (was 380)
          +1 mvnsite 0m 56s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 5s the patch passed
          +1 javadoc 1m 3s the patch passed
          -1 unit 90m 27s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          112m 3s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestAsyncHDFSWithHA
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:e2f6409
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12810994/HDFS-10301.006.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0eace52e9fa4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2449db5
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15794/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15794/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 32s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 24s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 1m 2s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 368 unchanged - 12 fixed = 372 total (was 380) +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 5s the patch passed +1 javadoc 1m 3s the patch passed -1 unit 90m 27s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 112m 3s Reason Tests Failed junit tests hadoop.hdfs.TestAsyncHDFSWithHA   hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:e2f6409 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12810994/HDFS-10301.006.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0eace52e9fa4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2449db5 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15794/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15794/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15794/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Failed tests don't seem to be introduced by the patch. These tests pass locally with the patch.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Failed tests don't seem to be introduced by the patch. These tests pass locally with the patch.
          Hide
          cmccabe Colin P. McCabe added a comment -
          +        if (context.getTotalRpcs() == context.getCurRpc() + 1) {
          +          long leaseId = this.getBlockReportLeaseManager().removeLease(node);
          +          BlockManagerFaultInjector.getInstance().
          +              removeBlockReportLease(node, leaseId);
                   }
          +        LOG.debug("Processing RPC with index " + context.getCurRpc()
          +            + " out of total " + context.getTotalRpcs() + " RPCs in "
          +            + "processReport 0x" +
          +            Long.toHexString(context.getReportId()));
                 }
          

          This won't work in the presence of reordered RPCs. If the RPCs are reordered so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 will be rejected.

              for (int r = 0; r < reports.length; r++) {
                final BlockListAsLongs blocks = reports[r].getBlocks();
                if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) {
          

          Using object equality to compare two BlockListAsLongs objects is very surprising to anyone reading the code. In general, I find the idea of overloading the block list to sometimes not be a block list to be very weird and surprising. If we are going to do it, it certainly needs a lot of comments in the code to explain what's going on. I think it would be clearer and less error-prone just to add an optional list of storage ID strings in the .proto file.

              if (nn.getFSImage().isUpgradeFinalized()) {
                Set<String> storageIDsInBlockReport = new HashSet<>();
                if (context.getTotalRpcs() == context.getCurRpc() + 1) {
                  for (StorageBlockReport report : reports) {
                    storageIDsInBlockReport.add(report.getStorage().getStorageID());
                  }
                  bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport);
                }
              }
          

          This isn't going to work in the presence of reordered RPCs, is it? If curRpc 1 appears before curRpc 0, we'll never get into this clause at all and so zombies won't be removed. Considering you are so concerned that my patch didn't solve the interleaved and/or reordered RPC case, this seems like something you should solve. I also don't understand what the rationale for ignoring zombies during an upgrade is. Keep in mind zombie storages can lead to data loss under some conditions (see HDFS-7960 for details).

          Show
          cmccabe Colin P. McCabe added a comment - + if (context.getTotalRpcs() == context.getCurRpc() + 1) { + long leaseId = this .getBlockReportLeaseManager().removeLease(node); + BlockManagerFaultInjector.getInstance(). + removeBlockReportLease(node, leaseId); } + LOG.debug( "Processing RPC with index " + context.getCurRpc() + + " out of total " + context.getTotalRpcs() + " RPCs in " + + "processReport 0x" + + Long .toHexString(context.getReportId())); } This won't work in the presence of reordered RPCs. If the RPCs are reordered so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 will be rejected. for ( int r = 0; r < reports.length; r++) { final BlockListAsLongs blocks = reports[r].getBlocks(); if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) { Using object equality to compare two BlockListAsLongs objects is very surprising to anyone reading the code. In general, I find the idea of overloading the block list to sometimes not be a block list to be very weird and surprising. If we are going to do it, it certainly needs a lot of comments in the code to explain what's going on. I think it would be clearer and less error-prone just to add an optional list of storage ID strings in the .proto file. if (nn.getFSImage().isUpgradeFinalized()) { Set< String > storageIDsInBlockReport = new HashSet<>(); if (context.getTotalRpcs() == context.getCurRpc() + 1) { for (StorageBlockReport report : reports) { storageIDsInBlockReport.add(report.getStorage().getStorageID()); } bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport); } } This isn't going to work in the presence of reordered RPCs, is it? If curRpc 1 appears before curRpc 0, we'll never get into this clause at all and so zombies won't be removed. Considering you are so concerned that my patch didn't solve the interleaved and/or reordered RPC case, this seems like something you should solve. I also don't understand what the rationale for ignoring zombies during an upgrade is. Keep in mind zombie storages can lead to data loss under some conditions (see HDFS-7960 for details).
          Hide
          shv Konstantin Shvachko added a comment -

          Colin, let me introduce to you Vinitha Reddy Gankidi. She works in Hadoop team at LinkedIn. This is her first encounter with HDFS community. Let's try to make it pleasant enough so that she wished to come back and work with us more.

          Considering you are so concerned that my patch didn't solve ...

          I think this grumbling is referred to me Konstantin Shvachko. Colin, just to clarify, the last patch was submitted by Vinitha Reddy Gankidi. She is a different person, not Konstantin Shvachko. If you have issues with me let's try to keep it separate.

          Show
          shv Konstantin Shvachko added a comment - Colin, let me introduce to you Vinitha Reddy Gankidi . She works in Hadoop team at LinkedIn. This is her first encounter with HDFS community. Let's try to make it pleasant enough so that she wished to come back and work with us more. Considering you are so concerned that my patch didn't solve ... I think this grumbling is referred to me Konstantin Shvachko . Colin, just to clarify, the last patch was submitted by Vinitha Reddy Gankidi . She is a different person, not Konstantin Shvachko . If you have issues with me let's try to keep it separate.
          Hide
          cmccabe Colin P. McCabe added a comment -

          The "you" in that sentence was targetted at you, Konstantin Shvachko. I realized that Vinitha Reddy Gankidi wrote the patch, but I spoke imprecisely. Sorry for the confusion.

          This is her first encounter with HDFS community. Let's try to make it pleasant enough so that she wished to come back and work with us more.

          To be honest, I don't think this is a very good newbie JIRA. It is clearly a very controversial issue, and it's also a very difficult piece of code with a lot of subtlety. Since you clearly have strong opinions about this JIRA, I believe it would be more appropriate for you to post patches implementing your ideas yourself. But that is up to you, of course.

          Show
          cmccabe Colin P. McCabe added a comment - The "you" in that sentence was targetted at you, Konstantin Shvachko . I realized that Vinitha Reddy Gankidi wrote the patch, but I spoke imprecisely. Sorry for the confusion. This is her first encounter with HDFS community. Let's try to make it pleasant enough so that she wished to come back and work with us more. To be honest, I don't think this is a very good newbie JIRA. It is clearly a very controversial issue, and it's also a very difficult piece of code with a lot of subtlety. Since you clearly have strong opinions about this JIRA, I believe it would be more appropriate for you to post patches implementing your ideas yourself. But that is up to you, of course.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Thanks for the review Colin. I have addressed your comments below:

          This won't work in the presence of reordered RPCs. If the RPCs are reordered so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 will be rejected.

          If curRpc 1 arrives before curRpc 0, the lease will be removed and node.leaseId will be set to zero. I have modified BlockReportLeaseManager to return true when node.leaseId = 0. I explained the same in my previous comment:
          https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15299255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15299255
          Please let me know if you see any issues with this approach.

          Using object equality to compare two BlockListAsLongs objects is very surprising to anyone reading the code.

          I uploaded a new patch (007) to address this issue. I have added a method isStorageReportOnly() to BlockListsAsLongs that returns true only for STORAGE_REPORT_ONLY BlockListsAsLong.

          In the upgrade case, there is no way to detect the zombie storages since the old DNs do not send the information about the storages in the BR in the last RPC. In practice, hot-swapping of DN drives and upgrading the DN may not happen at the same time.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Thanks for the review Colin. I have addressed your comments below: This won't work in the presence of reordered RPCs. If the RPCs are reordered so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 will be rejected. If curRpc 1 arrives before curRpc 0, the lease will be removed and node.leaseId will be set to zero. I have modified BlockReportLeaseManager to return true when node.leaseId = 0 . I explained the same in my previous comment: https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15299255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15299255 Please let me know if you see any issues with this approach. Using object equality to compare two BlockListAsLongs objects is very surprising to anyone reading the code. I uploaded a new patch (007) to address this issue. I have added a method isStorageReportOnly() to BlockListsAsLongs that returns true only for STORAGE_REPORT_ONLY BlockListsAsLong. In the upgrade case, there is no way to detect the zombie storages since the old DNs do not send the information about the storages in the BR in the last RPC. In practice, hot-swapping of DN drives and upgrading the DN may not happen at the same time.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 12s trunk passed
          +1 compile 0m 43s trunk passed
          +1 checkstyle 0m 29s trunk passed
          +1 mvnsite 0m 49s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 41s trunk passed
          +1 javadoc 0m 54s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 47s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 71m 23s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          90m 9s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestBpServiceActorScheduler
            hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
            hadoop.hdfs.server.datanode.TestLargeBlockReport
            hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:85209cc
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12812962/HDFS-10301.007.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 30154e0a37c6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0b9edf6
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15898/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15898/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15898/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15898/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 12s trunk passed +1 compile 0m 43s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 0m 49s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 41s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 42s the patch passed +1 javac 0m 42s the patch passed -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 71m 23s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 90m 9s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestBpServiceActorScheduler   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.datanode.TestLargeBlockReport   hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:85209cc JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12812962/HDFS-10301.007.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 30154e0a37c6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0b9edf6 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15898/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15898/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15898/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15898/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Colin P. McCabe this jira needs some action from you, because you are blocking it.

          Show
          shv Konstantin Shvachko added a comment - Colin P. McCabe this jira needs some action from you, because you are blocking it.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I apologize for the delays in reviewing. I am looking at HDFS-10301.007.patch. Is this the latest patch?

          I don't understand the purpose behind BlockListAsLongs#isStorageReportOnly. This function is never called. This state doesn't seem to be stored anywhere in what is sent over the wire, either. Is this an idea that was half-implemented, or did I miss something?

                if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) {
          

          Again, this is comparing by object reference equality not deep equality. This is comment I also made in the last review that wasn't addressed.

          My comment earlier is that I didn't want to overload block reports to be storage reports. A storage report is not a kind of block report. They shouldn't be using the same protobuf objects or Java data structures. This isn't addressed in the current patch, which continues the confusing practice of using the same data structure for both.

          In the upgrade case, there is no way to detect the zombie storages since the old DNs do not send the information about the storages in the BR in the last RPC. In practice, hot-swapping of DN drives and upgrading the DN may not happen at the same time.

          The set of storages that the DN reports can change for a variety of reasons, most of which are not hotswap related. One reason is because a drive has become bad and got kicked out of the set of currently active volumes. Another reason is because the DN got taken down by the administrator, a volume got removed, and the DN was brought back up. It's rather frustrating that your patch doesn't support zombie storage removal during upgrade, and mine does, and yet Konstantin Shvachko is blocking my patch.

          Show
          cmccabe Colin P. McCabe added a comment - I apologize for the delays in reviewing. I am looking at HDFS-10301 .007.patch. Is this the latest patch? I don't understand the purpose behind BlockListAsLongs#isStorageReportOnly . This function is never called. This state doesn't seem to be stored anywhere in what is sent over the wire, either. Is this an idea that was half-implemented, or did I miss something? if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) { Again, this is comparing by object reference equality not deep equality. This is comment I also made in the last review that wasn't addressed. My comment earlier is that I didn't want to overload block reports to be storage reports. A storage report is not a kind of block report. They shouldn't be using the same protobuf objects or Java data structures. This isn't addressed in the current patch, which continues the confusing practice of using the same data structure for both. In the upgrade case, there is no way to detect the zombie storages since the old DNs do not send the information about the storages in the BR in the last RPC. In practice, hot-swapping of DN drives and upgrading the DN may not happen at the same time. The set of storages that the DN reports can change for a variety of reasons, most of which are not hotswap related. One reason is because a drive has become bad and got kicked out of the set of currently active volumes. Another reason is because the DN got taken down by the administrator, a volume got removed, and the DN was brought back up. It's rather frustrating that your patch doesn't support zombie storage removal during upgrade, and mine does, and yet Konstantin Shvachko is blocking my patch.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          I apologize for attaching a wrong patch. Thanks for pointing it out Colin P. McCabe. I uploaded the correct patch now (008) that calls the isStorageReport method. Adding an optional list of storage ID strings in the .proto file would add more overhead since these optional parameters would have to be sent with default values for all other block report RPCs in addition to the last RPC of the block report. I can add more comments in the code to explain what's going on. Thoughts?

          Show
          redvine Vinitha Reddy Gankidi added a comment - I apologize for attaching a wrong patch. Thanks for pointing it out Colin P. McCabe . I uploaded the correct patch now (008) that calls the isStorageReport method. Adding an optional list of storage ID strings in the .proto file would add more overhead since these optional parameters would have to be sent with default values for all other block report RPCs in addition to the last RPC of the block report. I can add more comments in the code to explain what's going on. Thoughts?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 35s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 47s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          -0 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380)
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 61m 58s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          82m 35s



          Reason Tests
          Failed junit tests hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
            hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.TestFileChecksum



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817774/HDFS-10301.008.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux f991214eeeeb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / eb47163
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16045/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16045/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16045/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16045/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 35s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 47s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed -0 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380) +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 61m 58s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 82m 35s Reason Tests Failed junit tests hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.TestFileChecksum Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817774/HDFS-10301.008.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f991214eeeeb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / eb47163 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16045/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16045/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16045/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16045/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Reviewed latest patch. Got a few nits:

          1. In BlockManager.removeZombieStorages() you should add a check if(node == null). The node could have been deleted while we were not holding writeLock.
          2. DatanodeDescriptor.removeZombieStorages() methods does not need to be public. Should be package private.
          3. Remove empty line change in BPServiceActor.blockReport().
            Also the comment here is confusing. You might want to clarify it.
          4. checkstyle warning tells that either STORAGE_REPORT should be declared final or it should not be all-capital. I think final makes sense.

          Also I think that Colin P. McCabe's veto, formulated as
          I am -1 on a patch which adds extra RPCs.
          is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs.
          So I plan to commit this three days from today. Given of course the nits above are fixed.

          Show
          shv Konstantin Shvachko added a comment - Reviewed latest patch. Got a few nits: In BlockManager.removeZombieStorages() you should add a check if(node == null) . The node could have been deleted while we were not holding writeLock . DatanodeDescriptor.removeZombieStorages() methods does not need to be public. Should be package private. Remove empty line change in BPServiceActor.blockReport() . Also the comment here is confusing. You might want to clarify it. checkstyle warning tells that either STORAGE_REPORT should be declared final or it should not be all-capital. I think final makes sense. Also I think that Colin P. McCabe 's veto, formulated as I am -1 on a patch which adds extra RPCs. is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs. So I plan to commit this three days from today. Given of course the nits above are fixed.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Attached a new patch (009) addressing Konstantin's comments. I cannot make STORAGE_REPORT final since it needs to be referenced from a static context. Instead, I renamed it to 'Storage_Report'.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Attached a new patch (009) addressing Konstantin's comments. I cannot make STORAGE_REPORT final since it needs to be referenced from a static context. Instead, I renamed it to 'Storage_Report'.
          Hide
          shv Konstantin Shvachko added a comment -

          All-capital identifiers are reserved for constants, that is static final STORAGE_REPORT

          Show
          shv Konstantin Shvachko added a comment - All-capital identifiers are reserved for constants, that is static final STORAGE_REPORT
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          I have made STORAGE_REPORT static final in the 010 patch.

          Show
          redvine Vinitha Reddy Gankidi added a comment - I have made STORAGE_REPORT static final in the 010 patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 13m 6s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 29s trunk passed
          +1 compile 0m 52s trunk passed
          +1 checkstyle 0m 33s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 53s trunk passed
          +1 javadoc 0m 59s trunk passed
          +1 mvninstall 0m 54s the patch passed
          +1 compile 0m 49s the patch passed
          +1 javac 0m 49s the patch passed
          -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 368 unchanged - 12 fixed = 372 total (was 380)
          +1 mvnsite 0m 56s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 1s the patch passed
          +1 javadoc 0m 54s the patch passed
          -1 unit 59m 54s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          93m 49s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestFileCreationDelete
            hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818632/HDFS-10301.009.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2c2af2824bb8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c2bcffb
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16081/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16081/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16081/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 13m 6s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 29s trunk passed +1 compile 0m 52s trunk passed +1 checkstyle 0m 33s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 53s trunk passed +1 javadoc 0m 59s trunk passed +1 mvninstall 0m 54s the patch passed +1 compile 0m 49s the patch passed +1 javac 0m 49s the patch passed -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 368 unchanged - 12 fixed = 372 total (was 380) +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 1s the patch passed +1 javadoc 0m 54s the patch passed -1 unit 59m 54s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 93m 49s Reason Tests Failed junit tests hadoop.hdfs.TestFileCreationDelete   hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818632/HDFS-10301.009.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2c2af2824bb8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c2bcffb Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16081/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16081/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16081/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16081/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          — a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java

          @@ -308,10 +308,10 @@ public synchronized boolean checkLease(DatanodeDescriptor dn,
                 return false;
               }
               if (node.leaseId == 0) {
          -      LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
          -               "is not in the pending set.",
          -               Long.toHexString(id), dn.getDatanodeUuid());
          -      return false;
          +      LOG.debug("DN {} is not in the pending set because BR with "
          +              + "lease 0x{} was processed out of order",
          +          dn.getDatanodeUuid(), Long.toHexString(id));
          +      return true;
               }
          

          There are other reasons why node.leaseId might be 0, besides block reports getting processed out of order. For example, an RPC could have gotten duplicated by something in the network. Let's not change the existing error message.

                      StorageBlockReport[] lastSplitReport =
                          new StorageBlockReport[perVolumeBlockLists.size()];
                      // When block reports are split, the last RPC in the block report
                      // has the information about all storages in the block report.
                      // See HDFS-10301 for more details. To achieve this, the last RPC
                      // has 'n' storage reports, where 'n' is the number of storages in
                      // a DN. The actual block replicas are reported only for the
                      // last/n-th storage.
          

          Why do we have to use such a complex and confusing approach? Like I commented earlier, a report of the existing storages is not the same as a block report. Why are we creating BlockListAsLongs objects that aren't lists of blocks?

          There is a much simpler approach, which is just adding a list of storage IDs to the block report RPC by making a backwards-compatible protobuf change. It's really easy:

          +repeated String allStorageIds = 8;
          
          Show
          cmccabe Colin P. McCabe added a comment - — a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java @@ -308,10 +308,10 @@ public synchronized boolean checkLease(DatanodeDescriptor dn, return false ; } if (node.leaseId == 0) { - LOG.warn( "BR lease 0x{} is not valid for DN {}, because the DN " + - "is not in the pending set." , - Long .toHexString(id), dn.getDatanodeUuid()); - return false ; + LOG.debug( "DN {} is not in the pending set because BR with " + + "lease 0x{} was processed out of order" , + dn.getDatanodeUuid(), Long .toHexString(id)); + return true ; } There are other reasons why node.leaseId might be 0, besides block reports getting processed out of order. For example, an RPC could have gotten duplicated by something in the network. Let's not change the existing error message. StorageBlockReport[] lastSplitReport = new StorageBlockReport[perVolumeBlockLists.size()]; // When block reports are split, the last RPC in the block report // has the information about all storages in the block report. // See HDFS-10301 for more details. To achieve this , the last RPC // has 'n' storage reports, where 'n' is the number of storages in // a DN. The actual block replicas are reported only for the // last/n-th storage. Why do we have to use such a complex and confusing approach? Like I commented earlier, a report of the existing storages is not the same as a block report. Why are we creating BlockListAsLongs objects that aren't lists of blocks? There is a much simpler approach, which is just adding a list of storage IDs to the block report RPC by making a backwards-compatible protobuf change. It's really easy: +repeated String allStorageIds = 8;
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 24s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 25s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 45s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380)
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 54s the patch passed
          +1 unit 58m 58s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          79m 20s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818684/HDFS-10301.010.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d118fdcd3ae4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 92fe2db
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16082/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16082/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16082/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 24s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 25s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 45s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed -0 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 368 unchanged - 12 fixed = 370 total (was 380) +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 54s the patch passed +1 unit 58m 58s hadoop-hdfs in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 79m 20s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818684/HDFS-10301.010.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d118fdcd3ae4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 92fe2db Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16082/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16082/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16082/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          > adding a list of storage IDs to the block report RPC by making a backwards-compatible protobuf change.

          The storage ids are already there in current BR protobuf. Why would you want a new field for that. You will need to duplicate all storage ids in case of full block report, when it is not split into multiple RPCs. Seems confusing and inefficient to me.

          Show
          shv Konstantin Shvachko added a comment - > adding a list of storage IDs to the block report RPC by making a backwards-compatible protobuf change. The storage ids are already there in current BR protobuf. Why would you want a new field for that. You will need to duplicate all storage ids in case of full block report, when it is not split into multiple RPCs. Seems confusing and inefficient to me.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Vinitha Reddy Gankidi,

          1. looks like checkstyle found two long lines in BlockManager.
          2. Also makes sense to keep the log message unchanged as Colin P. McCabe suggested.

          I am +1 on the patch after this is fixed.

          Show
          shv Konstantin Shvachko added a comment - Hey Vinitha Reddy Gankidi , looks like checkstyle found two long lines in BlockManager . Also makes sense to keep the log message unchanged as Colin P. McCabe suggested. I am +1 on the patch after this is fixed.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          > For example, an RPC could have gotten duplicated by something in the network.
          Colin P. McCabe Doesn't TCP ignore duplicate packets? Can you explain how this can happen? If the RPC does get duplicated, then we shouldn't return true right when node.leaseId == 0 ?

          Show
          redvine Vinitha Reddy Gankidi added a comment - > For example, an RPC could have gotten duplicated by something in the network. Colin P. McCabe Doesn't TCP ignore duplicate packets? Can you explain how this can happen? If the RPC does get duplicated, then we shouldn't return true right when node.leaseId == 0 ?
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Patch 011 fixes the two checkstyle issues and the log message.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Patch 011 fixes the two checkstyle issues and the log message.
          Hide
          shv Konstantin Shvachko added a comment -

          This looks good to me. +1

          Show
          shv Konstantin Shvachko added a comment - This looks good to me. +1
          Hide
          liuml07 Mingliang Liu added a comment -

          Thanks for the patch, Vinitha Reddy Gankidi. I'm catching up all the insightful discussions here and learned a lot.

          1. FSImage#isUpgradeFinalized is not volatile and nn.getFSImage().isUpgradeFinalized() is not holding the read lock in NameNodeRpcServer#blockReport(). Is this a problem? This is not very related to this issue though.

          2.

          TestNameNodePrunesMissingStorages.java
             for (Future<IOException> future: futureList) {
               try {
                 future.get();
               } catch (Exception e) {
                 LOG.error("Processing block report failed due to {}", e);
               }
             }
          

          I think we need to interpret the return value of the future.get()?
          If you’re gonna process exceptions thrown by the task, I think we don’t need to return it explicitly as Callable.call() is permitted to throw checked exceptions which get propagated back to the calling thread (wrapped as ExecutionException IIRC).

          3.

          TestNameNodePrunesMissingStorages.java
                DatanodeStorageInfo[] newStorageInfos = dnDescriptor.getStorageInfos();
                Assert.assertEquals(storageInfos.length, newStorageInfos.length);
                for (int i = 0; i < storageInfos.length; i++) {
                  Assert.assertTrue(storageInfos[i] == newStorageInfos[i]);
                }
          

          do you mean

          Assert.assertArrayEquals(storageInfos, dnDescriptor.getStorageInfos());
          
          Minor comments:
          1. We should add javadoc for STORAGE_REPORT as it’s not that straightforward defined in BlockListAsLongs abstract class.
          2. assert (blockList.getNumberOfBlocks() == -1); I believe we don’t need to use assert statement along with Assert.asserEquals()?
          3. Always use slf4j placeholder in the code as you are doing int he latest patch. Specifically
            BlockManager.java
                    LOG.debug("Processing RPC with index " + context.getCurRpc()
            	            + " out of total " + context.getTotalRpcs() + " RPCs in "
            	            + "processReport 0x" +
            	            Long.toHexString(context.getReportId()));
            

            We MUST use placeholder here to avoid string construction if the log level is INFO and above.
            More examples are:LOG.info("Block pool id: " + blockPoolId); can be simplified as LOG.info("Block pool id: {}“, blockPoolId);
            And for exceptions we don’t need placeholder if it’s the last parameter. So LOG.error("Processing block report failed due to {}", e); can be LOG.error("Processing block report failed due to ", e);

          4. I see unnecessary blank lines in the v11 patch.
          5. I see not addressed long line checkstyle warnings in BlockManager
          6. if (nn.getFSImage().isUpgradeFinalized()
              Set<String> storageIDsInBlockReport = new HashSet<>();
            
  if (context.getTotalRpcs() == context.getCurRpc() + 1) {
            

            can be

            if (nn.getFSImage().isUpgradeFinalized() &&
                context.getTotalRpcs() == context.getCurRpc() + 1) {
              Set<String> storageIDsInBlockReport = new HashSet<>();
            
          7. BPServiceActor.java
            DatanodeCommand cmd;
            if () {
              cmd = …
            else {
              cmd = …
            }
            

            Let’s make cmd final.

          Show
          liuml07 Mingliang Liu added a comment - Thanks for the patch, Vinitha Reddy Gankidi . I'm catching up all the insightful discussions here and learned a lot. 1. FSImage#isUpgradeFinalized is not volatile and nn.getFSImage().isUpgradeFinalized() is not holding the read lock in NameNodeRpcServer#blockReport() . Is this a problem? This is not very related to this issue though. 2. TestNameNodePrunesMissingStorages.java for (Future<IOException> future : futureList) { try { future .get(); } catch (Exception e) { LOG.error( "Processing block report failed due to {}" , e); } } I think we need to interpret the return value of the future.get()? If you’re gonna process exceptions thrown by the task, I think we don’t need to return it explicitly as Callable.call() is permitted to throw checked exceptions which get propagated back to the calling thread (wrapped as ExecutionException IIRC). 3. TestNameNodePrunesMissingStorages.java DatanodeStorageInfo[] newStorageInfos = dnDescriptor.getStorageInfos(); Assert.assertEquals(storageInfos.length, newStorageInfos.length); for ( int i = 0; i < storageInfos.length; i++) { Assert.assertTrue(storageInfos[i] == newStorageInfos[i]); } do you mean Assert.assertArrayEquals(storageInfos, dnDescriptor.getStorageInfos()); Minor comments: We should add javadoc for STORAGE_REPORT as it’s not that straightforward defined in BlockListAsLongs abstract class. assert (blockList.getNumberOfBlocks() == -1); I believe we don’t need to use assert statement along with Assert.asserEquals() ? Always use slf4j placeholder in the code as you are doing int he latest patch. Specifically BlockManager.java LOG.debug( "Processing RPC with index " + context.getCurRpc() + " out of total " + context.getTotalRpcs() + " RPCs in " + "processReport 0x" + Long .toHexString(context.getReportId())); We MUST use placeholder here to avoid string construction if the log level is INFO and above. More examples are: LOG.info("Block pool id: " + blockPoolId); can be simplified as LOG.info("Block pool id: {}“, blockPoolId); And for exceptions we don’t need placeholder if it’s the last parameter. So LOG.error("Processing block report failed due to {}", e); can be LOG.error("Processing block report failed due to ", e); I see unnecessary blank lines in the v11 patch. I see not addressed long line checkstyle warnings in BlockManager if (nn.getFSImage().isUpgradeFinalized() Set< String > storageIDsInBlockReport = new HashSet<>(); 
 if (context.getTotalRpcs() == context.getCurRpc() + 1) { can be 
 if (nn.getFSImage().isUpgradeFinalized() && context.getTotalRpcs() == context.getCurRpc() + 1) { Set< String > storageIDsInBlockReport = new HashSet<>(); BPServiceActor.java DatanodeCommand cmd; if () { cmd = … else { cmd = … } Let’s make cmd final.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 31s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 4s trunk passed
          +1 compile 0m 46s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 44s trunk passed
          +1 javadoc 1m 1s trunk passed
          +1 mvninstall 0m 52s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 54s the patch passed
          -1 unit 72m 6s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          92m 28s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMXBean



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0cd8f805076b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8fbe6ec
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16094/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16094/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 31s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 4s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 44s trunk passed +1 javadoc 1m 1s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 54s the patch passed -1 unit 72m 6s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 92m 28s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMXBean Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0cd8f805076b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8fbe6ec Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16094/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16094/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 34s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 51s trunk passed
          +1 javadoc 0m 55s trunk passed
          +1 mvninstall 0m 52s the patch passed
          +1 compile 0m 50s the patch passed
          +1 javac 0m 50s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 54s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 59m 47s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          80m 52s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1dc89d76ac9d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9ccf935
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16095/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16095/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16095/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 34s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 0m 55s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 50s the patch passed +1 javac 0m 50s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 54s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 59m 47s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 80m 52s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1dc89d76ac9d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9ccf935 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16095/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16095/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16095/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 40s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 44s trunk passed
          +1 javadoc 0m 54s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 47s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 69m 17s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          88m 25s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0abbdfa64137 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 37362c2
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16096/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16096/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16096/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 44s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 69m 17s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 88m 25s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestReconstructStripedBlocks   hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0abbdfa64137 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 37362c2 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16096/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16096/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16096/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 28s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 57s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 42s trunk passed
          +1 javadoc 0m 55s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 53s the patch passed
          +1 javadoc 0m 54s the patch passed
          -1 unit 69m 27s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          89m 17s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2d088f995b16 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 37362c2
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16097/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16097/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16097/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 28s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 57s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 42s trunk passed +1 javadoc 0m 55s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 53s the patch passed +1 javadoc 0m 54s the patch passed -1 unit 69m 27s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 89m 17s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2d088f995b16 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 37362c2 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16097/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16097/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16097/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 48s trunk passed
          +1 compile 0m 51s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 0m 59s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 2m 3s trunk passed
          +1 javadoc 1m 0s trunk passed
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 46s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 62m 57s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 40s The patch does not generate ASF License warnings.
          85m 20s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1697f8ceb2a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1c9d2ab
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16099/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16099/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16099/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 48s trunk passed +1 compile 0m 51s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 3s trunk passed +1 javadoc 1m 0s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 46s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 62m 57s hadoop-hdfs in the patch failed. +1 asflicense 0m 40s The patch does not generate ASF License warnings. 85m 20s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1697f8ceb2a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1c9d2ab Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16099/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16099/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16099/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Vinitha Reddy Gankidi asked: Colin P. McCabe Doesn't TCP ignore duplicate packets? Can you explain how this can happen? If the RPC does get duplicated, then we shouldn't return true right when node.leaseId == 0 ?

          That is a fair point. However, the retry logic in the RPC system could resend the message if the NN did not respond within a certain amount of time. Or there could just be a bug which leads to the DN sending full block reports when it shouldn't. In any case, we cannot assume that reordered messages are the problem.

          Konstantin Shvachko wrote: Also I think that Colin P. McCabe's veto, formulated as I am -1 on a patch which adds extra RPCs. is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs.

          Yes, this patch addresses my concerns. I withdraw my -1.

          Konstantin Shvachko wrote: The storage ids are already there in current BR protobuf. Why would you want a new field for that. You will need to duplicate all storage ids in case of full block report, when it is not split into multiple RPCs. Seems confusing and inefficient to me.

          A new field would be best because we would avoid creating fake BlockListAsLong objects with length -1, and re-using protobuf fields for purposes they weren't intended for. A list of storage IDs is not a block report or a list of blocks, and using the same data structures is very confusing. If you want to optimize by not sending the list of storage reports separately when the block report has only one RPC, that's easy to do. Just check if numRpcs == 1 and don't set or check the optional list of strings in that case. I'm not going to block the patch over this, but I do think people reading this will wonder what you were thinking if you overload the PB fields in this way.

          Show
          cmccabe Colin P. McCabe added a comment - Vinitha Reddy Gankidi asked: Colin P. McCabe Doesn't TCP ignore duplicate packets? Can you explain how this can happen? If the RPC does get duplicated, then we shouldn't return true right when node.leaseId == 0 ? That is a fair point. However, the retry logic in the RPC system could resend the message if the NN did not respond within a certain amount of time. Or there could just be a bug which leads to the DN sending full block reports when it shouldn't. In any case, we cannot assume that reordered messages are the problem. Konstantin Shvachko wrote: Also I think that Colin P. McCabe's veto, formulated as I am -1 on a patch which adds extra RPCs. is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs. Yes, this patch addresses my concerns. I withdraw my -1. Konstantin Shvachko wrote: The storage ids are already there in current BR protobuf. Why would you want a new field for that. You will need to duplicate all storage ids in case of full block report, when it is not split into multiple RPCs. Seems confusing and inefficient to me. A new field would be best because we would avoid creating fake BlockListAsLong objects with length -1, and re-using protobuf fields for purposes they weren't intended for. A list of storage IDs is not a block report or a list of blocks, and using the same data structures is very confusing. If you want to optimize by not sending the list of storage reports separately when the block report has only one RPC, that's easy to do. Just check if numRpcs == 1 and don't set or check the optional list of strings in that case. I'm not going to block the patch over this, but I do think people reading this will wonder what you were thinking if you overload the PB fields in this way.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 11s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 48s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 50s the patch passed
          +1 javadoc 0m 51s the patch passed
          -1 unit 60m 44s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          80m 49s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
            hadoop.hdfs.server.balancer.TestBalancer
            hadoop.hdfs.server.namenode.ha.TestHAFsck



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6d72422a28d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1c9d2ab
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16101/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16101/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16101/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 11s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 50s the patch passed +1 javadoc 0m 51s the patch passed -1 unit 60m 44s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 80m 49s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestBootstrapStandby   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.namenode.ha.TestHAFsck Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6d72422a28d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1c9d2ab Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16101/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16101/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16101/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 57s trunk passed
          +1 compile 0m 53s trunk passed
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 1m 1s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 1m 50s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 53s the patch passed
          +1 compile 0m 47s the patch passed
          +1 javac 0m 47s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 58s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 60m 55s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          81m 27s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c29b1d2b82aa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 38128ba
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16106/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16106/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16106/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 57s trunk passed +1 compile 0m 53s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 1m 50s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 58s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 60m 55s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 81m 27s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c29b1d2b82aa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 38128ba Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16106/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16106/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16106/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          I am canceling patch available, because Jenkins is spinning the build all over again. Some bug there?

          Show
          shv Konstantin Shvachko added a comment - I am canceling patch available, because Jenkins is spinning the build all over again. Some bug there?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 28s trunk passed
          +1 compile 0m 46s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 0m 58s trunk passed
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 49s patch/hadoop-hdfs-project/hadoop-hdfs no findbugs output file (hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml)
          -1 javadoc 1m 1s hadoop-hdfs-project_hadoop-hdfs generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0)
          -1 unit 17m 30s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          37m 48s



          Reason Tests
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
            org.apache.hadoop.hdfs.TestDatanodeDeath
            org.apache.hadoop.hdfs.TestPread
            org.apache.hadoop.hdfs.TestBlockStoragePolicy



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ea46cbba5d17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 38128ba
          Default Java 1.8.0_91
          findbugs v3.0.0
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/diff-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16112/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16112/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 28s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 0m 58s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 49s patch/hadoop-hdfs-project/hadoop-hdfs no findbugs output file (hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml) -1 javadoc 1m 1s hadoop-hdfs-project_hadoop-hdfs generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0) -1 unit 17m 30s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 37m 48s Reason Tests Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2   org.apache.hadoop.hdfs.TestDatanodeDeath   org.apache.hadoop.hdfs.TestPread   org.apache.hadoop.hdfs.TestBlockStoragePolicy Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ea46cbba5d17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 38128ba Default Java 1.8.0_91 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/diff-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16112/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16112/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16112/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 54s trunk passed
          +1 compile 0m 46s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 55s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 46s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 61m 39s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          81m 18s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestEditLog



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c1a40f43f99c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 38128ba
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16111/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16111/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16111/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 54s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 46s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 61m 39s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 81m 18s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestEditLog Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818943/HDFS-10301.011.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c1a40f43f99c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 38128ba Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16111/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16111/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16111/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          My general approach to protobuf structures is to minimize changes, especially with redundant fields.
          It is very easy to add fields, as you demonstrated, but you can never remove them.
          So add them only if you absolutely must.
          But different people can of course have different approaches.

          Show
          shv Konstantin Shvachko added a comment - My general approach to protobuf structures is to minimize changes, especially with redundant fields. It is very easy to add fields, as you demonstrated, but you can never remove them. So add them only if you absolutely must. But different people can of course have different approaches.
          Hide
          andrew.wang Andrew Wang added a comment -

          My understanding of PB is that we have a fixed 4 bits for tags, so there isn't really overhead to adding more PB fields as long as they are optional or repeated. See: https://developers.google.com/protocol-buffers/docs/encoding

          Given that, I'd err on the side of readability rather than trying to reuse existing fields. Since block reports are a pretty infrequent operation, I wouldn't stress over a few bytes if we end up filling a required field with a dummy value. I agree with Colin that the current overloading of BlockListAsLongs is confusing.

          Show
          andrew.wang Andrew Wang added a comment - My understanding of PB is that we have a fixed 4 bits for tags, so there isn't really overhead to adding more PB fields as long as they are optional or repeated. See: https://developers.google.com/protocol-buffers/docs/encoding Given that, I'd err on the side of readability rather than trying to reuse existing fields. Since block reports are a pretty infrequent operation, I wouldn't stress over a few bytes if we end up filling a required field with a dummy value. I agree with Colin that the current overloading of BlockListAsLongs is confusing.
          Hide
          shv Konstantin Shvachko added a comment -

          As I commented earlier I am not in favor of adding redundant fields. The readability argument is also quite questionable, because you end up either filling storage information in two fields, or sending it in different fields for different types of block report messages.
          In more details:

          • Suppose we introduced repeated String allStorageIds.
          • In full report (which is not split into multiple RPCs) we already have all storage ids listed in StorageBlockReports. And we don't need allStorageIds. If we nevertheless fill allStorageIds it will be confusing.
          • In a report that is split into multiple RPCs we fill allStorageIds, because only one storage is reported. So in this case we will use a different field to pass storageIDs.
          • I think code is more readable when the same information is passed via the same fields, and is not duplicated.
          Show
          shv Konstantin Shvachko added a comment - As I commented earlier I am not in favor of adding redundant fields. The readability argument is also quite questionable, because you end up either filling storage information in two fields, or sending it in different fields for different types of block report messages. In more details: Suppose we introduced repeated String allStorageIds . In full report (which is not split into multiple RPCs) we already have all storage ids listed in StorageBlockReports. And we don't need allStorageIds . If we nevertheless fill allStorageIds it will be confusing. In a report that is split into multiple RPCs we fill allStorageIds , because only one storage is reported. So in this case we will use a different field to pass storageIDs. I think code is more readable when the same information is passed via the same fields, and is not duplicated.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Thanks for the review Mingliang Liu. I have attached a new patch (012) that addresses your comments.

          > FSImage#isUpgradeFinalized is not volatile and nn.getFSImage().isUpgradeFinalized() is not holding the read lock in NameNodeRpcServer#blockReport(). Is this a problem? This is not very related to this issue though.

          My patch does not make any changes to the isUpgradeFinalized method. If this is a problem, we should open another JIRA to address it.

          > If you’re gonna process exceptions thrown by the task, I think we don’t need to return it explicitly as Callable.call()is permitted to throw checked exceptions

          Thanks for the good suggestion! I have modified the Callable.call() to return a DataNodeCommand and throw IOException. I don't explicitly catch the exception since junit will take care of it.

          > I think we need to interpret the return value of the future.get()?

          future.get() returns DataNodeCommand which we don’t take care about and don’t need to interpret.

          > do you mean Assert.assertArrayEquals(storageInfos, dnDescriptor.getStorageInfos());

          Yes, thanks for that! I have made the change.

          > We should add javadoc for STORAGE_REPORT as it’s not that straightforward defined in BlockListAsLongsabstract class.

          Added the doc

          > assert (blockList.getNumberOfBlocks() == -1); I believe we don’t need to use assert statement along with Assert.asserEquals()?

          I changed the assert to Assert.assertEquals. However, the existing test does use assert as well assert(numBlocksReported >= expectedTotalBlockCount);

          > Always use slf4j placeholder in the code as you are doing int he latest patch.

          Thanks for the tip! I noticed that placeholders were not used consistently. I tried to maintain the logging style that was already used in that particular file. I have modified all the log messages in my patch to use placeholders wherever possible. Sl4j was not used in some places, for instance in TestNameNodePrunesMissingStorages.

          > I see unnecessary blank lines in the v11 patch.I see not addressed long line checkstyle warnings in BlockManager

          I noticed two blank lines in TestNameNodePrunesMissingStorages inv11 patch. I removed that. I do not see any checkstyle warnings.

          > if (nn.getFSImage().isUpgradeFinalized() &&
          context.getTotalRpcs() == context.getCurRpc() + 1) {
          Set<String> storageIDsInBlockReport = new HashSet<>();

          Combined as suggested.

          > BPServiceActor.java Let’s make cmd final.

          Since cmd was not final previously, I have left it unchanged.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Thanks for the review Mingliang Liu . I have attached a new patch (012) that addresses your comments. > FSImage#isUpgradeFinalized is not volatile and nn.getFSImage().isUpgradeFinalized() is not holding the read lock in NameNodeRpcServer#blockReport(). Is this a problem? This is not very related to this issue though. My patch does not make any changes to the isUpgradeFinalized method. If this is a problem, we should open another JIRA to address it. > If you’re gonna process exceptions thrown by the task, I think we don’t need to return it explicitly as Callable.call()is permitted to throw checked exceptions Thanks for the good suggestion! I have modified the Callable.call() to return a DataNodeCommand and throw IOException. I don't explicitly catch the exception since junit will take care of it. > I think we need to interpret the return value of the future.get()? future.get() returns DataNodeCommand which we don’t take care about and don’t need to interpret. > do you mean Assert.assertArrayEquals(storageInfos, dnDescriptor.getStorageInfos()); Yes, thanks for that! I have made the change. > We should add javadoc for STORAGE_REPORT as it’s not that straightforward defined in BlockListAsLongsabstract class. Added the doc > assert (blockList.getNumberOfBlocks() == -1); I believe we don’t need to use assert statement along with Assert.asserEquals()? I changed the assert to Assert.assertEquals. However, the existing test does use assert as well assert(numBlocksReported >= expectedTotalBlockCount); > Always use slf4j placeholder in the code as you are doing int he latest patch. Thanks for the tip! I noticed that placeholders were not used consistently. I tried to maintain the logging style that was already used in that particular file. I have modified all the log messages in my patch to use placeholders wherever possible. Sl4j was not used in some places, for instance in TestNameNodePrunesMissingStorages. > I see unnecessary blank lines in the v11 patch.I see not addressed long line checkstyle warnings in BlockManager I noticed two blank lines in TestNameNodePrunesMissingStorages inv11 patch. I removed that. I do not see any checkstyle warnings. > if (nn.getFSImage().isUpgradeFinalized() && context.getTotalRpcs() == context.getCurRpc() + 1) { Set<String> storageIDsInBlockReport = new HashSet<>(); Combined as suggested. > BPServiceActor.java Let’s make cmd final. Since cmd was not final previously, I have left it unchanged.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 10m 58s trunk passed
          +1 compile 0m 45s trunk passed
          +1 checkstyle 0m 33s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 49s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 44s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 48s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 65m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 23s The patch does not generate ASF License warnings.
          88m 51s



          Reason Tests
          Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819238/HDFS-10301.012.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux b189d80c0730 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 703fdf8
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16171/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16171/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16171/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 10m 58s trunk passed +1 compile 0m 45s trunk passed +1 checkstyle 0m 33s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 49s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 44s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 65m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 88m 51s Reason Tests Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819238/HDFS-10301.012.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b189d80c0730 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 703fdf8 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16171/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16171/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16171/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          TestWebHdfsTimeouts failure does not look to be related to the changes.
          The last patch looks good.

          Show
          shv Konstantin Shvachko added a comment - TestWebHdfsTimeouts failure does not look to be related to the changes. The last patch looks good.
          Hide
          shv Konstantin Shvachko added a comment -

          I just committed this trunk. Congratulations Vinitha Reddy Gankidi!
          Also ported to branch-2 and branch-2.8.
          Will keep it open while a port to branch-2.7 / 6 is in the works.

          Show
          shv Konstantin Shvachko added a comment - I just committed this trunk. Congratulations Vinitha Reddy Gankidi ! Also ported to branch-2 and branch-2.8. Will keep it open while a port to branch-2.7 / 6 is in the works.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 6s HDFS-10301 does not apply to branch-2. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820078/HDFS-10301.branch-2.patch
          JIRA Issue HDFS-10301
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16182/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s HDFS-10301 does not apply to branch-2. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820078/HDFS-10301.branch-2.patch JIRA Issue HDFS-10301 Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16182/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #10148 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10148/)
          HDFS-10301. Interleaving processing of storages from repeated block (shv: rev 85a20508bd04851d47c24b7562ec2927d5403446)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10148 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10148/ ) HDFS-10301 . Interleaving processing of storages from repeated block (shv: rev 85a20508bd04851d47c24b7562ec2927d5403446) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Added a patch for branch-2.7.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Added a patch for branch-2.7.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 12m 8s branch-2.7 passed
          +1 compile 1m 2s branch-2.7 passed with JDK v1.8.0_101
          +1 compile 1m 2s branch-2.7 passed with JDK v1.7.0_101
          +1 checkstyle 0m 31s branch-2.7 passed
          +1 mvnsite 1m 1s branch-2.7 passed
          +1 mvneclipse 0m 17s branch-2.7 passed
          +1 findbugs 3m 10s branch-2.7 passed
          +1 javadoc 1m 8s branch-2.7 passed with JDK v1.8.0_101
          +1 javadoc 1m 55s branch-2.7 passed with JDK v1.7.0_101
          +1 mvninstall 1m 10s the patch passed
          +1 compile 1m 11s the patch passed with JDK v1.8.0_101
          +1 javac 1m 11s the patch passed
          +1 compile 1m 7s the patch passed with JDK v1.7.0_101
          +1 javac 1m 7s the patch passed
          -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 405 unchanged - 5 fixed = 408 total (was 410)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          -1 whitespace 0m 0s The patch has 7892 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 whitespace 3m 14s The patch 196 line(s) with tabs.
          +1 findbugs 3m 14s the patch passed
          +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101
          +1 javadoc 1m 38s the patch passed with JDK v1.7.0_101
          -1 unit 58m 42s hadoop-hdfs in the patch failed with JDK v1.7.0_101.
          -1 asflicense 0m 24s The patch generated 3 ASF License warnings.
          154m 32s



          Reason Tests
          JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestSafeMode
            hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
          JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.hdfs.server.namenode.TestFileTruncate
            hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12821107/HDFS-10301.branch-2.7.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 995eee067a99 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / 6cb2e97
          Default Java 1.7.0_101
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt
          JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16261/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16261/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 12m 8s branch-2.7 passed +1 compile 1m 2s branch-2.7 passed with JDK v1.8.0_101 +1 compile 1m 2s branch-2.7 passed with JDK v1.7.0_101 +1 checkstyle 0m 31s branch-2.7 passed +1 mvnsite 1m 1s branch-2.7 passed +1 mvneclipse 0m 17s branch-2.7 passed +1 findbugs 3m 10s branch-2.7 passed +1 javadoc 1m 8s branch-2.7 passed with JDK v1.8.0_101 +1 javadoc 1m 55s branch-2.7 passed with JDK v1.7.0_101 +1 mvninstall 1m 10s the patch passed +1 compile 1m 11s the patch passed with JDK v1.8.0_101 +1 javac 1m 11s the patch passed +1 compile 1m 7s the patch passed with JDK v1.7.0_101 +1 javac 1m 7s the patch passed -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 405 unchanged - 5 fixed = 408 total (was 410) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 14s the patch passed -1 whitespace 0m 0s The patch has 7892 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 3m 14s The patch 196 line(s) with tabs. +1 findbugs 3m 14s the patch passed +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101 +1 javadoc 1m 38s the patch passed with JDK v1.7.0_101 -1 unit 58m 42s hadoop-hdfs in the patch failed with JDK v1.7.0_101. -1 asflicense 0m 24s The patch generated 3 ASF License warnings. 154m 32s Reason Tests JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestSafeMode   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot JDK v1.7.0_101 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.TestFileTruncate   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12821107/HDFS-10301.branch-2.7.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 995eee067a99 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / 6cb2e97 Default Java 1.7.0_101 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_101.txt JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16261/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/16261/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16261/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          The patch for branch-2.7 looks good.
          I just committed this. Thank you Vinitha.

          Show
          shv Konstantin Shvachko added a comment - The patch for branch-2.7 looks good. I just committed this. Thank you Vinitha.
          Hide
          ebadger Eric Badger added a comment -

          Konstantin Shvachko, this breaks TestDataNodeVolumeFailure.testVolumeFailure(). blockReport() is called with context = null. Then inside of blockReport we try to call methods on context with it still set to null

          java.lang.NullPointerException: null
          	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1342)
          	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:189)
          
          Show
          ebadger Eric Badger added a comment - Konstantin Shvachko , this breaks TestDataNodeVolumeFailure.testVolumeFailure(). blockReport() is called with context = null. Then inside of blockReport we try to call methods on context with it still set to null java.lang.NullPointerException: null at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1342) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:189)
          Hide
          daryn Daryn Sharp added a comment -

          -1 This needs to be reverted and I'm too git-ignorant to to do. Our sandbox clusters won't come out of safemode because the NN thinks the DNs are reporting -1 blocks. I see this patch is return -1 blocks for a "storage report". I need to catch up on this jira but in the meantime it must be reverted.

          I find it odd this patch was committed with so many failed tests.

          Show
          daryn Daryn Sharp added a comment - -1 This needs to be reverted and I'm too git-ignorant to to do. Our sandbox clusters won't come out of safemode because the NN thinks the DNs are reporting -1 blocks. I see this patch is return -1 blocks for a "storage report". I need to catch up on this jira but in the meantime it must be reverted. I find it odd this patch was committed with so many failed tests.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Eric Badger Thanks for reporting this. TestDataNodeVolumeFailure does not call blockReport() with context=null on trunk. This was fixed as a part of HDFS-9260. We need to modify TestDataNodeVolumeFailure.testVolumeFailure() for branch-2.7 as well:

          -    cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, null);
          +    cluster.getNameNodeRpc().blockReport(dnR, bpid, reports,
          +        new BlockReportContext(1, 0, System.nanoTime()));
          
          Show
          redvine Vinitha Reddy Gankidi added a comment - Eric Badger Thanks for reporting this. TestDataNodeVolumeFailure does not call blockReport() with context=null on trunk. This was fixed as a part of HDFS-9260 . We need to modify TestDataNodeVolumeFailure.testVolumeFailure() for branch-2.7 as well: - cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, null ); + cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, + new BlockReportContext(1, 0, System .nanoTime()));
          Hide
          shv Konstantin Shvachko added a comment -

          And the rest of the tests are passing locally.

          Show
          shv Konstantin Shvachko added a comment - And the rest of the tests are passing locally.
          Hide
          shv Konstantin Shvachko added a comment -

          Looks like we need to fix TestDataNodeVolumeFailure for all 2 branches. Will open a jira for that promptly.
          Sorry guys for breaking your build.

          Daryn Sharp, it seems that you are overreacting a bit. Only one test is broken. I rerun other tests reported by Jenkins. They all pass.
          Could you please elaborate on the problem with the sandbox cluster. If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests.

          Show
          shv Konstantin Shvachko added a comment - Looks like we need to fix TestDataNodeVolumeFailure for all 2 branches. Will open a jira for that promptly. Sorry guys for breaking your build. Daryn Sharp , it seems that you are overreacting a bit. Only one test is broken. I rerun other tests reported by Jenkins. They all pass. Could you please elaborate on the problem with the sandbox cluster. If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests.
          Hide
          daryn Daryn Sharp added a comment -

          Block report processing does need to be so complicated. Just ban single-rpc reports and the problem goes away. At most the DN is retransmitting the same storage report. Reprocessing it should not be a problem.

          If the only objection is multiple RPCs are a scalability issue, I completely disagree.

          1. A single RPC is not scalable. It will not work on clusters with many hundreds of millions of blocks.
          2. The size of the RPC quickly becomes an issue. The memory pressure and pre-mature promotion rate - even with a huge young gen (8-16G) - is not sustainable.
          3. The time to process the RPC becomes an issue. The DN timing out and retransmitting (and causing this jira's bug) becomes an issue.

          Per-storage block reports eliminated multiple full GCs (2-3 for 5-10mins each) during startup on large clusters.

          Please revert or I'll grab someone here to help me do it.

          Show
          daryn Daryn Sharp added a comment - Block report processing does need to be so complicated. Just ban single-rpc reports and the problem goes away. At most the DN is retransmitting the same storage report. Reprocessing it should not be a problem. If the only objection is multiple RPCs are a scalability issue, I completely disagree. A single RPC is not scalable. It will not work on clusters with many hundreds of millions of blocks. The size of the RPC quickly becomes an issue. The memory pressure and pre-mature promotion rate - even with a huge young gen (8-16G) - is not sustainable. The time to process the RPC becomes an issue. The DN timing out and retransmitting (and causing this jira's bug) becomes an issue. Per-storage block reports eliminated multiple full GCs (2-3 for 5-10mins each) during startup on large clusters. Please revert or I'll grab someone here to help me do it.
          Hide
          daryn Daryn Sharp added a comment -

          If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests.

          You have more faith in the unit tests than I do. I do not have time to fully debug why sandbox clusters are DOA when I object to the implementation anyway.

          Show
          daryn Daryn Sharp added a comment - If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests. You have more faith in the unit tests than I do. I do not have time to fully debug why sandbox clusters are DOA when I object to the implementation anyway.
          Hide
          shv Konstantin Shvachko added a comment -

          Daryn, I do not understand what you disagree with. And what is the problem with the implementation, which you object to?
          Nobody is taking away per-storage block reports.

          If you don't have time to understand the jira and don't have time to look at your own sandbox cluster, then how I can help you.

          Show
          shv Konstantin Shvachko added a comment - Daryn, I do not understand what you disagree with. And what is the problem with the implementation, which you object to? Nobody is taking away per-storage block reports. If you don't have time to understand the jira and don't have time to look at your own sandbox cluster, then how I can help you.
          Hide
          daryn Daryn Sharp added a comment -

          I've read this jira as I said I would, and I've looked at the patch.

          Our nightly build & deploy for 2.7 is broken. DNs claim to report thousands of blocks, NN says nope, -1. This should be reason enough to revert until we get to the bottom of it. We're reverting internally. If that fixes it, I will have someone help me revert tomorrow morning if not already.

          Why is this patch changing per-storage reports when it's the single-rpc report that is the problem? Is this change compatible?

          1. What does an old NN do if it gets this pseudo-report? Will it forget about all the blocks on the non-last storage?
          2. What does a new NN do when it gets old style reports? Will it remove all but the last storage?

          This zombie detection, report context, etc is getting out of hand. I don't understand why the zombie detection isn't based on the healthy storages in the heartbeat. Anything else gets flagged as failed and the heartbeat monitor disposes of them.

          Show
          daryn Daryn Sharp added a comment - I've read this jira as I said I would, and I've looked at the patch. Our nightly build & deploy for 2.7 is broken. DNs claim to report thousands of blocks, NN says nope, -1. This should be reason enough to revert until we get to the bottom of it. We're reverting internally. If that fixes it, I will have someone help me revert tomorrow morning if not already. Why is this patch changing per-storage reports when it's the single-rpc report that is the problem? Is this change compatible? What does an old NN do if it gets this pseudo-report? Will it forget about all the blocks on the non-last storage? What does a new NN do when it gets old style reports? Will it remove all but the last storage? This zombie detection, report context, etc is getting out of hand. I don't understand why the zombie detection isn't based on the healthy storages in the heartbeat. Anything else gets flagged as failed and the heartbeat monitor disposes of them.
          Hide
          shv Konstantin Shvachko added a comment -

          We are actively looking into possible problem with this change. LMK if the revert fixes the problem. Just to clarify you are using per-storage reports on your cluster?
          In the meantime answering your questions Daryn.

          Why is this patch changing per-storage reports when it's the single-rpc report that is the problem?
          The problem is both with single-rpc and per-storage reports. In multi-rpc case DNs can send repeated RPCs for each storage and this will cause incorrect zombie detection if RPCs processed out of order.

          Is this change compatible?
          Yes. The compatibility issues were discussed here above.

          What does an old NN do if it gets this pseudo-report?
          According to Rolling upgrade documentation we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk to old NNs.

          What does a new NN do when it gets old style reports? Will it remove all but the last storage?
          As mentioned in this comment old DataNodes reports will be processed as regular reports, only zombie storages will not be removed until DNs upgraded.
          During upgrade no storages are removed.

          Show
          shv Konstantin Shvachko added a comment - We are actively looking into possible problem with this change. LMK if the revert fixes the problem. Just to clarify you are using per-storage reports on your cluster? In the meantime answering your questions Daryn. Why is this patch changing per-storage reports when it's the single-rpc report that is the problem ? The problem is both with single-rpc and per-storage reports. In multi-rpc case DNs can send repeated RPCs for each storage and this will cause incorrect zombie detection if RPCs processed out of order. Is this change compatible ? Yes. The compatibility issues were discussed here above. What does an old NN do if it gets this pseudo-report ? According to Rolling upgrade documentation we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk to old NNs. What does a new NN do when it gets old style reports? Will it remove all but the last storage ? As mentioned in this comment old DataNodes reports will be processed as regular reports, only zombie storages will not be removed until DNs upgraded. During upgrade no storages are removed.
          Hide
          shv Konstantin Shvachko added a comment -

          Unfortunately, there seems to be a problem with the patch. Storage report is not recognized in certain cases.
          Will revert the commits.

          Show
          shv Konstantin Shvachko added a comment - Unfortunately, there seems to be a problem with the patch. Storage report is not recognized in certain cases. Will revert the commits.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #10189 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10189/)
          Revert "HDFS-10301. Interleaving processing of storages from repeated (shv: rev c4463f2ef20d2cb634a1249246f83c451975f3dc)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10189 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10189/ ) Revert " HDFS-10301 . Interleaving processing of storages from repeated (shv: rev c4463f2ef20d2cb634a1249246f83c451975f3dc) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
          Hide
          kihwal Kihwal Lee added a comment -

          Konstantin Shvachko, thanks for the revert, but I think you missed branch-2.7.

          Show
          kihwal Kihwal Lee added a comment - Konstantin Shvachko , thanks for the revert, but I think you missed branch-2.7.
          Hide
          daryn Daryn Sharp added a comment -

          According to Rolling upgrade documentation we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk to old NNs.

          Although the docs claim downgrading the NN requires full downtime, or rolling downgrade DNs first, we should make an effort to ensure DNs are compatible when possible. An emergency NN downgrade shouldn't require full downtime when a failover to the prior release would suffice.

          I don't like the idea of BRs triggering pruning of storages. That aside, the patch doesn't appear to close the race. The lock is released after the storage report is processing and re-acquired to find to find the "zombies". We're back to out of order processing of heartbeats, which I think is the real problem, causing false-positives.

          How about something like this? DatanodeDescriptor descriptor tracks the last BlockReportContext#reportId. The value is updated when processing a BR - which has latest value if BR lease let it in. Heartbeat now includes the last used reportId. On the NN, if the heartbeat contains this field, NN will ignore heartbeart if not equal to DND. There's little details like DN re-registration resetting the field, etc, but wouldn't something simple like this work?

          Show
          daryn Daryn Sharp added a comment - According to Rolling upgrade documentation we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk to old NNs. Although the docs claim downgrading the NN requires full downtime, or rolling downgrade DNs first, we should make an effort to ensure DNs are compatible when possible. An emergency NN downgrade shouldn't require full downtime when a failover to the prior release would suffice. – I don't like the idea of BRs triggering pruning of storages. That aside, the patch doesn't appear to close the race. The lock is released after the storage report is processing and re-acquired to find to find the "zombies". We're back to out of order processing of heartbeats, which I think is the real problem, causing false-positives. How about something like this? DatanodeDescriptor descriptor tracks the last BlockReportContext#reportId . The value is updated when processing a BR - which has latest value if BR lease let it in. Heartbeat now includes the last used reportId . On the NN, if the heartbeat contains this field, NN will ignore heartbeart if not equal to DND. There's little details like DN re-registration resetting the field, etc, but wouldn't something simple like this work?
          Hide
          shv Konstantin Shvachko added a comment -

          Just pushed branch-2.7

          Show
          shv Konstantin Shvachko added a comment - Just pushed branch-2.7
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          The real problem is the state associated with the Datanode (curBlockReportRpcsSeen, curBlockReportId) to figure out when to remove zombie storages. This state gets messed up when block reports are processed out of order. The current patch still allows out of order processing of block reports but gets rid of this state associated with the Datanode.

          In patch 012, although isStorageReport method returns true for STORAGE_REPORT BlockListsAsLong, this method gets overridden to return false in the BufferDecoder. I have attached a new patch (013) that fixes this issue.

          Show
          redvine Vinitha Reddy Gankidi added a comment - The real problem is the state associated with the Datanode (curBlockReportRpcsSeen, curBlockReportId) to figure out when to remove zombie storages. This state gets messed up when block reports are processed out of order. The current patch still allows out of order processing of block reports but gets rid of this state associated with the Datanode. In patch 012, although isStorageReport method returns true for STORAGE_REPORT BlockListsAsLong, this method gets overridden to return false in the BufferDecoder. I have attached a new patch (013) that fixes this issue.
          Hide
          shv Konstantin Shvachko added a comment -

          the patch doesn't appear to close the race.
          It does. The problem is not that we release the lock, but that there is block-report-related state in different places, particularly the BitSet in DatanodeDescriptor, see e.g. this comment under (1). The state can be reset by interleaving reports. So if we don't have the state there is no race condition, because block reports are independent and can be processed in any order.
          The path does just that it removes the block-report-tracking state. See here under Approach. In earlier version of the patch Vinitha introduced storage report as a separate RPC, which was opposed by Colin. The latest patch incorporates the storage report with the RPC for the last storage report. But the processing of all reports is still independent, therefore releasing the lock is not a problem.
          Just adding more details to Vinithas response.

          wouldn't something simple like this work?
          I don't see how it will work. Not simple. The heartbeats can come at any time between reports or between storages and update the reportId.

          Daryn Sharp, I think removing br-state substantially simplifies report processing and makes reports independent (or idempotent), which is important by itself and solves the problem of interleaving reports. The last patch solves the bug you reported (thanks) and provides a unit test for it. As you see this jira was under development for quite a while. Would be good to commit it soon. Do you still stand behind your veto given the latest patch?

          Show
          shv Konstantin Shvachko added a comment - the patch doesn't appear to close the race. It does. The problem is not that we release the lock, but that there is block-report-related state in different places, particularly the BitSet in DatanodeDescriptor , see e.g. this comment under (1). The state can be reset by interleaving reports. So if we don't have the state there is no race condition, because block reports are independent and can be processed in any order. The path does just that it removes the block-report-tracking state. See here under Approach. In earlier version of the patch Vinitha introduced storage report as a separate RPC, which was opposed by Colin. The latest patch incorporates the storage report with the RPC for the last storage report. But the processing of all reports is still independent, therefore releasing the lock is not a problem. Just adding more details to Vinithas response. wouldn't something simple like this work ? I don't see how it will work. Not simple. The heartbeats can come at any time between reports or between storages and update the reportId. Daryn Sharp , I think removing br-state substantially simplifies report processing and makes reports independent (or idempotent), which is important by itself and solves the problem of interleaving reports. The last patch solves the bug you reported (thanks) and provides a unit test for it. As you see this jira was under development for quite a while. Would be good to commit it soon. Do you still stand behind your veto given the latest patch?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 43s trunk passed
          +1 compile 1m 2s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 10s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 57s trunk passed
          +1 javadoc 0m 54s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 58m 53s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 25s The patch does not generate ASF License warnings.
          81m 7s



          Reason Tests
          Failed junit tests hadoop.tracing.TestTracing
            hadoop.security.TestRefreshUserMappings



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12822408/HDFS-10301.013.patch
          JIRA Issue HDFS-10301
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux b164d05d4a39 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6255859
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16340/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16340/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16340/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 43s trunk passed +1 compile 1m 2s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 10s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 57s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 28s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 368 unchanged - 12 fixed = 368 total (was 380) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 58m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 25s The patch does not generate ASF License warnings. 81m 7s Reason Tests Failed junit tests hadoop.tracing.TestTracing   hadoop.security.TestRefreshUserMappings Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12822408/HDFS-10301.013.patch JIRA Issue HDFS-10301 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b164d05d4a39 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6255859 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16340/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16340/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16340/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          I think Daryn Sharp's veto above was addressed. The reason was not clearly formulated, but was understandably related to a bug in the previous version of the patch. The bug is fixed, and the unit test is provided.
          I will plan to commit this on Wednesday 08/10, if there are no further objections.

          Show
          shv Konstantin Shvachko added a comment - I think Daryn Sharp 's veto above was addressed. The reason was not clearly formulated, but was understandably related to a bug in the previous version of the patch. The bug is fixed, and the unit test is provided. I will plan to commit this on Wednesday 08/10, if there are no further objections.
          Hide
          daryn Daryn Sharp added a comment -

          I'll review today.

          Show
          daryn Daryn Sharp added a comment - I'll review today.
          Hide
          daryn Daryn Sharp added a comment -

          My main objections (other than the fatal bug) are the incompatible change to the protocol coupled with essentially a malformed block report buffer. It's an attempt to shoehorn into the block report processing what should be handled by a heartbeat's storage reports.

          I think when you say my compatibility concern was addressed, it wasn't code fixed, but stated as don't-do-that? Won't the empty storage reports in the last rpc cause an older NN to go into a replication storm? Full downtime on a ~5k cluster to rollback, then ~40 mins to go active, is unacceptable when a failover to the prior release would have worked if not for this patch.

          This approach will also negate asynchronously processing FBRs (like I did with IBRs).

          Zombies should be handled by the heartbeat's pruning of excess storages. As an illustration, shouldn't something close to this work?

          --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
          @@ -466,11 +466,16 @@ public void updateHeartbeatState(StorageReport[] reports, long cacheCapacity,
               setLastUpdateMonotonic(Time.monotonicNow());
               this.volumeFailures = volFailures;
               this.volumeFailureSummary = volumeFailureSummary;
          +
          +    boolean storagesUpToDate = true;
               for (StorageReport report : reports) {
                 DatanodeStorageInfo storage = updateStorage(report.getStorage());
                 if (checkFailedStorages) {
                   failedStorageInfos.remove(storage);
                 }
          +      // don't prune unless block reports for all the storages in the
          +      // heartbeat have been processed
          +      storagesUpToDate &= (storage.getLastBlockReportId() == curBlockReportId);
           
                 storage.receivedHeartbeat(report);
                 totalCapacity += report.getCapacity();
          @@ -492,7 +497,8 @@ public void updateHeartbeatState(StorageReport[] reports, long cacheCapacity,
               synchronized (storageMap) {
                 storageMapSize = storageMap.size();
               }
          -    if (storageMapSize != reports.length) {
          +    if (curBlockReportId != 0
          +        ? storagesUpToDate : storageMapSize != reports.length) {
                 pruneStorageMap(reports);
               }
             }
          @@ -527,6 +533,7 @@ private void pruneStorageMap(final StorageReport[] reports) {
                     // This can occur until all block reports are received.
                     LOG.debug("Deferring removal of stale storage {} with {} blocks",
                         storageInfo, storageInfo.numBlocks());
          +          storageInfo.setState(DatanodeStorage.State.FAILED);
                   }
                 }
               }
          

          The next heartbeat after all reports are sent triggers the pruning. Other changes are required, such as removal of much of the context processing code similar to the current patch.

          Show
          daryn Daryn Sharp added a comment - My main objections (other than the fatal bug) are the incompatible change to the protocol coupled with essentially a malformed block report buffer. It's an attempt to shoehorn into the block report processing what should be handled by a heartbeat's storage reports. I think when you say my compatibility concern was addressed, it wasn't code fixed, but stated as don't-do-that? Won't the empty storage reports in the last rpc cause an older NN to go into a replication storm? Full downtime on a ~5k cluster to rollback, then ~40 mins to go active, is unacceptable when a failover to the prior release would have worked if not for this patch. This approach will also negate asynchronously processing FBRs (like I did with IBRs). Zombies should be handled by the heartbeat's pruning of excess storages. As an illustration, shouldn't something close to this work? --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java @@ -466,11 +466,16 @@ public void updateHeartbeatState(StorageReport[] reports, long cacheCapacity, setLastUpdateMonotonic(Time.monotonicNow()); this .volumeFailures = volFailures; this .volumeFailureSummary = volumeFailureSummary; + + boolean storagesUpToDate = true ; for (StorageReport report : reports) { DatanodeStorageInfo storage = updateStorage(report.getStorage()); if (checkFailedStorages) { failedStorageInfos.remove(storage); } + // don't prune unless block reports for all the storages in the + // heartbeat have been processed + storagesUpToDate &= (storage.getLastBlockReportId() == curBlockReportId); storage.receivedHeartbeat(report); totalCapacity += report.getCapacity(); @@ -492,7 +497,8 @@ public void updateHeartbeatState(StorageReport[] reports, long cacheCapacity, synchronized (storageMap) { storageMapSize = storageMap.size(); } - if (storageMapSize != reports.length) { + if (curBlockReportId != 0 + ? storagesUpToDate : storageMapSize != reports.length) { pruneStorageMap(reports); } } @@ -527,6 +533,7 @@ private void pruneStorageMap( final StorageReport[] reports) { // This can occur until all block reports are received. LOG.debug( "Deferring removal of stale storage {} with {} blocks" , storageInfo, storageInfo.numBlocks()); + storageInfo.setState(DatanodeStorage.State.FAILED); } } } The next heartbeat after all reports are sent triggers the pruning. Other changes are required, such as removal of much of the context processing code similar to the current patch.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Daryn Sharp That is a good suggestion. Zombies should be handled by the heartbeat's pruning of excess storages.
          Why do we need to wait until block reports for all the storages in the heartbeat are processed?
          Do you want to submit a patch for this?

          Show
          redvine Vinitha Reddy Gankidi added a comment - Daryn Sharp That is a good suggestion. Zombies should be handled by the heartbeat's pruning of excess storages. Why do we need to wait until block reports for all the storages in the heartbeat are processed? Do you want to submit a patch for this?
          Hide
          cmccabe Colin P. McCabe added a comment -

          I don't think the heartbeat is the right place to handle reconciling the block storages. One reason is because this adds extra complexity and time to the heartbeat, which happens far more frequently than an FBR. We even talked about making the heartbeat lockless-- clearly you can't do that if you are traversing all the block storages. Taking the FSN lock is expensive and heartbeats are sent quite frequently from each DN-- every few seconds. Another reason reconciling storages in heartbeats is bad is because if the heartbeat tells you about a new storage, you won't know what blocks are in it until the FBR arrives. So the NN may end up assigning a bunch of new blocks to a storage which looks empty, but really is full.

          I came up with what I believe is the correct patch to fix this problem months ago. It's here as https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch . It doesn't modify any RPCs or add any new mechanisms. Instead, it just fixes the obvious bug in the HDFS-7960 logic. The only counter-argument to applying patch 005 that anyone ever came up with is that it doesn't eliminate zombies when FBRs get interleaved. But this is not a good counter-argument, since FBR interleaving is extremely, extremely rare in well-run clusters. The proof should be obvious-- if FBR interleaving happened on more clusters, more people would hit this serious data loss bug.

          This JIRA has been extremely frustrating. It seems like most, if not all, of the points that I brought up in my reviews were ignored. I talked about the obvious problems with compatibility with Konstantin Shvachko's solution and even explicitly asked him to test the upgrade case. I told him that this JIRA was a bad one to give to a promising new contributor such as Vinitha Reddy Gankidi, because it required a lot of context and was extremely tricky. Both myself and Andrew Wang commented that overloading BlockListAsLongs was confusing and not necessary. The patch confused "not modifying the .proto file" with "not modifying the RPC content" which are two very separate concepts, as I commented over and over. Clearly these comments were ignored. If anything, I think Konstantin Shvachko got very lucky that the bug manifested itself quickly rather than creating a serious data loss situation a few months down the road, like the one I had to debug when fixing HDFS-7960.

          Again I would urge you to just commit patch 005. Or at least evaluate it.

          Show
          cmccabe Colin P. McCabe added a comment - I don't think the heartbeat is the right place to handle reconciling the block storages. One reason is because this adds extra complexity and time to the heartbeat, which happens far more frequently than an FBR. We even talked about making the heartbeat lockless-- clearly you can't do that if you are traversing all the block storages. Taking the FSN lock is expensive and heartbeats are sent quite frequently from each DN-- every few seconds. Another reason reconciling storages in heartbeats is bad is because if the heartbeat tells you about a new storage, you won't know what blocks are in it until the FBR arrives. So the NN may end up assigning a bunch of new blocks to a storage which looks empty, but really is full. I came up with what I believe is the correct patch to fix this problem months ago. It's here as https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch . It doesn't modify any RPCs or add any new mechanisms. Instead, it just fixes the obvious bug in the HDFS-7960 logic. The only counter-argument to applying patch 005 that anyone ever came up with is that it doesn't eliminate zombies when FBRs get interleaved. But this is not a good counter-argument, since FBR interleaving is extremely, extremely rare in well-run clusters. The proof should be obvious-- if FBR interleaving happened on more clusters, more people would hit this serious data loss bug. This JIRA has been extremely frustrating. It seems like most, if not all, of the points that I brought up in my reviews were ignored. I talked about the obvious problems with compatibility with Konstantin Shvachko 's solution and even explicitly asked him to test the upgrade case. I told him that this JIRA was a bad one to give to a promising new contributor such as Vinitha Reddy Gankidi , because it required a lot of context and was extremely tricky. Both myself and Andrew Wang commented that overloading BlockListAsLongs was confusing and not necessary. The patch confused "not modifying the .proto file" with "not modifying the RPC content" which are two very separate concepts, as I commented over and over. Clearly these comments were ignored. If anything, I think Konstantin Shvachko got very lucky that the bug manifested itself quickly rather than creating a serious data loss situation a few months down the road, like the one I had to debug when fixing HDFS-7960 . Again I would urge you to just commit patch 005. Or at least evaluate it.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Colin P. McCabe, I agree with you this jira is frustrating. And I find it hard to overestimate your contribution to this. All points that you brought up here were addressed. And on multiple occasions. If you choose or fail to hear and understand other people arguments then there is little one could do to help this. So I will ignore (now for real) all but one of your meta-comments, because they were answered multiple times. Should you have a question please formulate it for me to answer.
          I do not think you are in a position to judge qualifications of a community member to fix a bug on public lists without knowing him or her. I find it unprofessional, rude.
          Working with Vinitha I can say she is no newbie in Hadoop, at all, even though she was not directly involved with the community until recently. You owe here an apology.
          Now to the subject of this issue.

          Show
          shv Konstantin Shvachko added a comment - Hey Colin P. McCabe , I agree with you this jira is frustrating. And I find it hard to overestimate your contribution to this. All points that you brought up here were addressed. And on multiple occasions. If you choose or fail to hear and understand other people arguments then there is little one could do to help this. So I will ignore (now for real) all but one of your meta-comments, because they were answered multiple times. Should you have a question please formulate it for me to answer. I do not think you are in a position to judge qualifications of a community member to fix a bug on public lists without knowing him or her. I find it unprofessional, rude. Working with Vinitha I can say she is no newbie in Hadoop, at all, even though she was not directly involved with the community until recently. You owe here an apology. Now to the subject of this issue.
          Hide
          shv Konstantin Shvachko added a comment -

          Took some time to look into heartbeat processing and consulting with Vinitha.
          So heartbeats currently have logic to remove failed storages reported by DNs via VolumeFailureSummary. This happens in three steps

          1. If DN reports a failed volume in a heartbeat (HDFS-7604), NN marks the corresponding DatanodeStorageInfo as FAILED. See DatanodeDescriptor.updateFailedStorage().
          2. When the HeartbeatManager.Monitor kicks in it checks the FAILED flag on the storage and does removeBlocksAssociatedTo(failedStorage). But it does not remove the storage itself. HDFS-7208
          3. On next heartbeat the DN will not report the storage that was previously reported as failed. This triggers NN to prune the storage DatanodeDescriptor.pruneStorageMap() because it doesn't contain replicas. HDFS-7596

          Essentially we already have dual mechanism of deleting storages - one through heartbeats another via block reports. So we can remove the redundancy. Daryn Sharp's idea simplifies a lot of code, does not require changes in any RPCs, is fully backward compatible, and eliminates the notion of zombie storage, which solves the interleaving report problem. I think we should go for it.

          Initially I was concerned about removing storages in heartbeats, but

          1. We already do it anyway
          2. All heartbeats hold FSN.readLock whether with failed storages or not. The scanning of the storages takes a lock on the corresponding DatanodeDescriptor.storageMap, which is fine-grain.
          3. Storages are not actually removed in a heartbeat, only flagged as FAILED. The replica removal is performed by a background Montor.
          4. If we decide to implement lock-less heartbeats we can move the storage reporting logic into a separate RPC periodically sent by DNs independently of and less frequently than regular heartbeats.
          Show
          shv Konstantin Shvachko added a comment - Took some time to look into heartbeat processing and consulting with Vinitha. So heartbeats currently have logic to remove failed storages reported by DNs via VolumeFailureSummary . This happens in three steps If DN reports a failed volume in a heartbeat ( HDFS-7604 ), NN marks the corresponding DatanodeStorageInfo as FAILED. See DatanodeDescriptor.updateFailedStorage() . When the HeartbeatManager.Monitor kicks in it checks the FAILED flag on the storage and does removeBlocksAssociatedTo(failedStorage) . But it does not remove the storage itself. HDFS-7208 On next heartbeat the DN will not report the storage that was previously reported as failed. This triggers NN to prune the storage DatanodeDescriptor.pruneStorageMap() because it doesn't contain replicas. HDFS-7596 Essentially we already have dual mechanism of deleting storages - one through heartbeats another via block reports. So we can remove the redundancy. Daryn Sharp 's idea simplifies a lot of code, does not require changes in any RPCs, is fully backward compatible, and eliminates the notion of zombie storage, which solves the interleaving report problem. I think we should go for it. Initially I was concerned about removing storages in heartbeats, but We already do it anyway All heartbeats hold FSN.readLock whether with failed storages or not. The scanning of the storages takes a lock on the corresponding DatanodeDescriptor.storageMap , which is fine-grain. Storages are not actually removed in a heartbeat, only flagged as FAILED. The replica removal is performed by a background Montor. If we decide to implement lock-less heartbeats we can move the storage reporting logic into a separate RPC periodically sent by DNs independently of and less frequently than regular heartbeats.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Thanks Konstantin Shvachko for summarizing how zombies can be detected and appropriately handled using the existing mechanism in heartbeat. I am working on a patch that implements this.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Thanks Konstantin Shvachko for summarizing how zombies can be detected and appropriately handled using the existing mechanism in heartbeat. I am working on a patch that implements this.
          Hide
          zhz Zhe Zhang added a comment -

          Some more background about TestAddOverReplicatedStripedBlocks. We developed the EC feature starting from NameNode. To test NameNode EC logic without the client ready, we added several test methods to emulate blocks such as createStripedFile and addBlockToFile. In this case, those "fake" block reports confused the NN.

          In this particular test, the below sequence happens:

          1. Client creates file on NameNode
          2. Client adds blocks to the file on NameNode without really creating the blocks on DN
          3. DN sends "fake" block reports to NN, with randomly generated storage IDs.
                  DatanodeStorage storage = new DatanodeStorage(UUID.randomUUID().toString());
                  StorageReceivedDeletedBlocks[] reports = DFSTestUtil
                      .makeReportForReceivedBlock(block,
                          ReceivedDeletedBlockInfo.BlockStatus.RECEIVED_BLOCK, storage);
                  for (StorageReceivedDeletedBlocks report : reports) {
                    ns.processIncrementalBlockReport(dn.getDatanodeId(), report);
                  }
            
          4. The above code (unintentionally) triggers the zombie storage logic because those randomly generated storages will not be in the next real BR.
          5. We inject real blocks onto the DNs. But out of 9 blocks in the group, we only injected 8. So when NN receives block report cluster.triggerBlockReports(); at L257, it should delete internal block #8, which was reported in the "fake" BR but not in the real BR. The log for that is:
            [Block report processor] WARN  blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2282)) - processReport 0xf79050ce694c3bfa: removed 1 replicas from storage 6c834645-8aec-48f2-ace8-122344e07e96, which no longer exists on the DataNode.
            

            6c834645-8aec-48f2-ace8-122344e07e96 is one of the randomly generated storages.

          I haven't fully understood how the above caused the test to fail. Hope it helps.

          Show
          zhz Zhe Zhang added a comment - Some more background about TestAddOverReplicatedStripedBlocks . We developed the EC feature starting from NameNode. To test NameNode EC logic without the client ready, we added several test methods to emulate blocks such as createStripedFile and addBlockToFile . In this case, those "fake" block reports confused the NN. In this particular test, the below sequence happens: Client creates file on NameNode Client adds blocks to the file on NameNode without really creating the blocks on DN DN sends "fake" block reports to NN, with randomly generated storage IDs. DatanodeStorage storage = new DatanodeStorage(UUID.randomUUID().toString()); StorageReceivedDeletedBlocks[] reports = DFSTestUtil .makeReportForReceivedBlock(block, ReceivedDeletedBlockInfo.BlockStatus.RECEIVED_BLOCK, storage); for (StorageReceivedDeletedBlocks report : reports) { ns.processIncrementalBlockReport(dn.getDatanodeId(), report); } The above code (unintentionally) triggers the zombie storage logic because those randomly generated storages will not be in the next real BR. We inject real blocks onto the DNs. But out of 9 blocks in the group, we only injected 8. So when NN receives block report cluster.triggerBlockReports(); at L257, it should delete internal block #8, which was reported in the "fake" BR but not in the real BR. The log for that is: [Block report processor] WARN blockmanagement.BlockManager (BlockManager.java:removeZombieReplicas(2282)) - processReport 0xf79050ce694c3bfa: removed 1 replicas from storage 6c834645-8aec-48f2-ace8-122344e07e96, which no longer exists on the DataNode. 6c834645-8aec-48f2-ace8-122344e07e96 is one of the randomly generated storages. I haven't fully understood how the above caused the test to fail. Hope it helps.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          IIUC we need to fix this logic not just for pruning storages but also deciding when to remove the block report lease.

          From BPServiceActor.java, we can assume at line 399 that the storage report just sent was processed successfully by the NameNode. i.e. DataNode getting back success is sufficient to conclude the report was successfully processed.

           393         for (int r = 0; r < reports.length; r++) {
           394           StorageBlockReport singleReport[] = { reports[r] };
           395           DatanodeCommand cmd = bpNamenode.blockReport(
           396               bpRegistration, bpos.getBlockPoolId(), singleReport,
           397               new BlockReportContext(reports.length, r, reportId,
           398                   fullBrLeaseId, true));
           399           blockReportSizes.add(
           400               calculateBlockReportPBSize(useBlocksBuffer, singleReport));
           401           numReportsSent++;
           402           numRPCs++;
           403           if (cmd != null) {
           404             cmds.add(cmd);
           405           }
          

          The DN can include a flag in the last RPC message i.e. when r == reports.length - 1 that tells the NameNode it is the last report in this batch and all previous ones were successfully processed. So it's safe to drop the lease and prune zombies.

          Also +1 for Daryn Sharp's idea to ban single-RPC reports, as this approach cannot be used for single-RPC reports.

          Show
          arpitagarwal Arpit Agarwal added a comment - IIUC we need to fix this logic not just for pruning storages but also deciding when to remove the block report lease. From BPServiceActor.java, we can assume at line 399 that the storage report just sent was processed successfully by the NameNode. i.e. DataNode getting back success is sufficient to conclude the report was successfully processed. 393 for ( int r = 0; r < reports.length; r++) { 394 StorageBlockReport singleReport[] = { reports[r] }; 395 DatanodeCommand cmd = bpNamenode.blockReport( 396 bpRegistration, bpos.getBlockPoolId(), singleReport, 397 new BlockReportContext(reports.length, r, reportId, 398 fullBrLeaseId, true )); 399 blockReportSizes.add( 400 calculateBlockReportPBSize(useBlocksBuffer, singleReport)); 401 numReportsSent++; 402 numRPCs++; 403 if (cmd != null ) { 404 cmds.add(cmd); 405 } The DN can include a flag in the last RPC message i.e. when r == reports.length - 1 that tells the NameNode it is the last report in this batch and all previous ones were successfully processed. So it's safe to drop the lease and prune zombies. Also +1 for Daryn Sharp 's idea to ban single-RPC reports, as this approach cannot be used for single-RPC reports.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Upon thorough investigation of heartbeat logic I have verified that unreported storages do get removed without any code change. Attached patch 014 eliminates the state and the zombie storage removal logic introduced in HDFS-7960.
          I have added a unit test that verifies that when a DN storage with blocks is removed, this storage is removed from the DatanodeDescriptor as well and does not linger forever. Unreported storages are marked as FAILED in updateHeartbeatState method when checkFailedStorages is true. Thus when a DN storage is removed, it will be marked as FAILED in the next heartbeat.
          The storage removal happens in 2 steps after that (Refer Step 2 & 3 in https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15427387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15427387).
          The test testRemovingStorageDoesNotProduceZombies introduced in HDFS-7960 passes by reducing the heartbeat recheck interval so that the test doesn't timeout. By default, the Heartbeat Manager removes blocks associated with failed storages every 5 minutes.
          I have ignored testProcessOverReplicatedAndMissingStripedBlock in this patch. Please refer to HDFS-10854 for more details.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Upon thorough investigation of heartbeat logic I have verified that unreported storages do get removed without any code change. Attached patch 014 eliminates the state and the zombie storage removal logic introduced in HDFS-7960 . I have added a unit test that verifies that when a DN storage with blocks is removed, this storage is removed from the DatanodeDescriptor as well and does not linger forever. Unreported storages are marked as FAILED in updateHeartbeatState method when checkFailedStorages is true. Thus when a DN storage is removed, it will be marked as FAILED in the next heartbeat. The storage removal happens in 2 steps after that (Refer Step 2 & 3 in https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15427387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15427387 ). The test testRemovingStorageDoesNotProduceZombies introduced in HDFS-7960 passes by reducing the heartbeat recheck interval so that the test doesn't timeout. By default, the Heartbeat Manager removes blocks associated with failed storages every 5 minutes. I have ignored testProcessOverReplicatedAndMissingStripedBlock in this patch. Please refer to HDFS-10854 for more details.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Arpit Agarwal In the latest patch, BR lease is removed when context.getTotalRpcs() == context.getCurRpc() + 1. If BRs are processed out of order/interleaved, the BR lease for the DN will be removed before all the BRs from the DN are processed. So, I have modified the checkLease method in BlockReportLeaseManager to return true when node.leaseId == 0. Please let me know if you see any issues with this approach.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Arpit Agarwal In the latest patch, BR lease is removed when context.getTotalRpcs() == context.getCurRpc() + 1 . If BRs are processed out of order/interleaved, the BR lease for the DN will be removed before all the BRs from the DN are processed. So, I have modified the checkLease method in BlockReportLeaseManager to return true when node.leaseId == 0 . Please let me know if you see any issues with this approach.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 36s trunk passed
          +1 compile 0m 50s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 0s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 1s trunk passed
          +1 mvninstall 0m 54s the patch passed
          +1 compile 0m 51s the patch passed
          +1 javac 0m 51s the patch passed
          -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 379 unchanged - 7 fixed = 382 total (was 386)
          +1 mvnsite 0m 55s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 8s the patch passed
          +1 javadoc 0m 58s the patch passed
          -1 unit 64m 26s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          87m 4s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestCrcCorruption



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10301
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12828153/HDFS-10301.014.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 25ecab498dc4 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 72dfb04
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16726/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16726/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16726/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16726/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 36s trunk passed +1 compile 0m 50s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 0s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 1s trunk passed +1 mvninstall 0m 54s the patch passed +1 compile 0m 51s the patch passed +1 javac 0m 51s the patch passed -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 379 unchanged - 7 fixed = 382 total (was 386) +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 8s the patch passed +1 javadoc 0m 58s the patch passed -1 unit 64m 26s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 87m 4s Reason Tests Failed junit tests hadoop.hdfs.TestCrcCorruption Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10301 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12828153/HDFS-10301.014.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 25ecab498dc4 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 72dfb04 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16726/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16726/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16726/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16726/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Vinitha, thanks for your thorough research. Minor things:

          1. In DatanodeDescriptor you should also remove 3 imports and EMPTY_STORAGE_INFO_LIST, which were used in removed methods only.
          2. Take a look at checkstyle something about a long line there.
          3. Checked that TestCrcCorruption does not fail for me.

          Did you try to setup a sandbox cluster with dfs.blockreport.split.threshold = 1?

          Show
          shv Konstantin Shvachko added a comment - Vinitha, thanks for your thorough research. Minor things: In DatanodeDescriptor you should also remove 3 imports and EMPTY_STORAGE_INFO_LIST , which were used in removed methods only. Take a look at checkstyle something about a long line there. Checked that TestCrcCorruption does not fail for me. Did you try to setup a sandbox cluster with dfs.blockreport.split.threshold = 1 ?
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          I have modified the checkLease method in BlockReportLeaseManager to return true when node.leaseId == 0. Please let me know if you see any issues with this approach.

          Vinitha Reddy Gankidi, IIUC this workaround bypasses the lease checks but the last report detection logic still remains broken. I am no longer sure zombie removal in heartbeats is safe and I was probably wrong to add it in HDFS-7596. Zombie removal is safe just after processing all storage reports from a full block report. So I think we should fix "last report detection".

          I believe the following two changes will fix this problem (same suggestion as my previous comment):

          1. The DataNode sends a flag with the last RPC message that indicates all the previous reports have been successfully processed. This is guaranteed to be correct and removes the burden from the NN.
          2. Eliminate single-RPC reports as Daryn suggested.

          Any thoughts on this?

          Thanks Konstantin and Vinitha for reporting this problem and your marathon efforts to fix it. It is a hard problem so I request we aim for consensus before committing a fix.

          Show
          arpitagarwal Arpit Agarwal added a comment - I have modified the checkLease method in BlockReportLeaseManager to return true when node.leaseId == 0. Please let me know if you see any issues with this approach. Vinitha Reddy Gankidi , IIUC this workaround bypasses the lease checks but the last report detection logic still remains broken. I am no longer sure zombie removal in heartbeats is safe and I was probably wrong to add it in HDFS-7596 . Zombie removal is safe just after processing all storage reports from a full block report. So I think we should fix "last report detection". I believe the following two changes will fix this problem (same suggestion as my previous comment): The DataNode sends a flag with the last RPC message that indicates all the previous reports have been successfully processed. This is guaranteed to be correct and removes the burden from the NN. Eliminate single-RPC reports as Daryn suggested. Any thoughts on this? Thanks Konstantin and Vinitha for reporting this problem and your marathon efforts to fix it. It is a hard problem so I request we aim for consensus before committing a fix.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Arpit Agarwal Storage reports are anyway sent in heartbeats and these reports have the information required to prune zombie storages. These storages are only marked as FAILED in the heartbeat. The replicas are removed in background by the HeartbeatManager. Why exactly do you think zombie removal in heartbeats is not safe? Why do we need to wait for all storage block reports from a FBR?

          Show
          redvine Vinitha Reddy Gankidi added a comment - Arpit Agarwal Storage reports are anyway sent in heartbeats and these reports have the information required to prune zombie storages. These storages are only marked as FAILED in the heartbeat. The replicas are removed in background by the HeartbeatManager. Why exactly do you think zombie removal in heartbeats is not safe? Why do we need to wait for all storage block reports from a FBR?
          Hide
          arpitagarwal Arpit Agarwal added a comment - - edited

          I don't think it is safe to remove storages (and hence block replicas from memory) when the NameNode doesn't have up to date block replica state because the block->storage mapping on the NameNode can be stale e.g. due to disk balancer moving replicas; or due to the way VolumeChoosingPolicy picks storages for new blocks.

          Show
          arpitagarwal Arpit Agarwal added a comment - - edited I don't think it is safe to remove storages (and hence block replicas from memory) when the NameNode doesn't have up to date block replica state because the block->storage mapping on the NameNode can be stale e.g. due to disk balancer moving replicas; or due to the way VolumeChoosingPolicy picks storages for new blocks.
          Hide
          shv Konstantin Shvachko added a comment -

          Still not clear what scenario concerns you. Arpit, could you please clarify.

          • Balancer copies a replica from a source DN to a target DN and when finished sends IBR with the target as a new replica location and a hint to remove old replica from the source DN. If the source or the target storage fails during this the transfer fails and Balancer moves on. If either of the storages fail after the transfer it is the same as the regular failure, the block will become under-replicated and recovered in due time.
          • For VolumeChoosingPolicy it is even more important to know early which storages failed in order to avoid choosing them as targets.

          In fact the code path of zombie storage removal via FBRs (introduced by HDFS-7960) is practically never triggered. Because heartbeats are much more often, the removal of zombies goes through heartbeats. So if this is unsafe as you assume we should have the evidence as it is happening right now.
          I agree this is complex, but we've learned a lot and now have a very good understanding of the workflow. Let's reach the consensus. I thought we had a silent one because nobody commented until the patch was submitted. It takes a lot of time and testing, on multiple branches, so waiting till the last moment is not productive.

          Show
          shv Konstantin Shvachko added a comment - Still not clear what scenario concerns you. Arpit, could you please clarify. Balancer copies a replica from a source DN to a target DN and when finished sends IBR with the target as a new replica location and a hint to remove old replica from the source DN. If the source or the target storage fails during this the transfer fails and Balancer moves on. If either of the storages fail after the transfer it is the same as the regular failure, the block will become under-replicated and recovered in due time. For VolumeChoosingPolicy it is even more important to know early which storages failed in order to avoid choosing them as targets. In fact the code path of zombie storage removal via FBRs (introduced by HDFS-7960 ) is practically never triggered. Because heartbeats are much more often, the removal of zombies goes through heartbeats. So if this is unsafe as you assume we should have the evidence as it is happening right now. I agree this is complex, but we've learned a lot and now have a very good understanding of the workflow. Let's reach the consensus. I thought we had a silent one because nobody commented until the patch was submitted. It takes a lot of time and testing, on multiple branches, so waiting till the last moment is not productive.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Balancer copies a replica from a source DN to a target DN and when finished sends IBR with the target as a new replica location and a hint to remove old replica from the source DN. If the source or the target storage fails during this the transfer fails and Balancer moves on. If either of the storages fail after the transfer it is the same as the regular failure, the block will become under-replicated and recovered in due time.

          We've seen IBRs are often delayed when the NN is overloaded so the NN's view of the replica map can lag. But I agree leaving zombie removals to heartbeats only fixes this bug and leaves us no worse than where we are today. The FBR vs heartbeat discussion can be separate. If we go this way let's fix the detection properly though. The last patch just no-ops the lease ID checks.

          For VolumeChoosingPolicy it is even more important to know early which storages failed in order to avoid choosing them as targets.

          By the way, the storage chosen by the NN is never used. The DN always uses the result of running volume choosing policy locally.

          Show
          arpitagarwal Arpit Agarwal added a comment - Balancer copies a replica from a source DN to a target DN and when finished sends IBR with the target as a new replica location and a hint to remove old replica from the source DN. If the source or the target storage fails during this the transfer fails and Balancer moves on. If either of the storages fail after the transfer it is the same as the regular failure, the block will become under-replicated and recovered in due time. We've seen IBRs are often delayed when the NN is overloaded so the NN's view of the replica map can lag. But I agree leaving zombie removals to heartbeats only fixes this bug and leaves us no worse than where we are today. The FBR vs heartbeat discussion can be separate. If we go this way let's fix the detection properly though. The last patch just no-ops the lease ID checks. For VolumeChoosingPolicy it is even more important to know early which storages failed in order to avoid choosing them as targets. By the way, the storage chosen by the NN is never used. The DN always uses the result of running volume choosing policy locally.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for all the effort on this tricky issue, Vinitha Reddy Gankidi. One question about the latest patch: in updateHeartbeatState, checkFailedStorages is set to true only when either the DN reports failed storage or the heartbeat is the first one since registration. Then can this cover DN hotswap case? For DN hotswap, I think the DN only sends FBR to notify NN about the change? Then if a fresh disk is used to replace a slow disk (but not failed) in hotswap, will we still hit