Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8995

Flaw in registration bookeeping can make DN die on reconnect

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Normally data nodes re-register with the namenode when it was unreachable for more than the heartbeat expiration and becomes reachable again. Datanodes keep retrying the last rpc call such as incremental block report and heartbeat and when it finally gets through the namenode tells it to re-register.

      We have observed that some of datanodes stay dead in such scenarios. Further investigation has revealed that those were told to shutdown by the namenode.

        Activity

        Hide
        kihwal Kihwal Lee added a comment -
        2018-09-26 12:15:08,497 WARN datanode.DataNode: Block pool BP-xxx (Datanode Uuid xxx) service to the-namenode.elephantland.gov/
        10.2.3.4:8020 is shutting down
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException):
         Data node DatanodeRegistration(0.0.0.0, datanodeUuid=xxx, infoPort=, infoSecurePort=, ipcPort=, storageInfo=lv=-56;cid=CID-1xxx;c=xxx)
         is attempting to report storage ID abc. Node 10.100.100.100:100 (actual ip addr) is expected to serve this storage.
                at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:483)
                at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:3094)
                at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:6406)
                at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1200)
                at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:215)
                at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26632)
                at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
        
                at org.apache.hadoop.ipc.Client.call(Client.java:1451)
                at org.apache.hadoop.ipc.Client.call(Client.java:1382)
                at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
                at com.sun.proxy.$Proxy14.blockReceivedAndDeleted(Unknown Source)
                at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:240)
                at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:289)
                at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:692)
                at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:834)
                at java.lang.Thread.run(Thread.java:745)
        

        The namenode is saying the incremental block report came with a DN registration containing address 0.0.0.0. This is what DN does on registration, not for block report. During registration, NN sends back with the address it saw and the DN uses the registration from the NN from that point on. So subsequent calls contain the address of its external interface. This exception trace suggests there is a bug in exception handling and re-registration.

        Show
        kihwal Kihwal Lee added a comment - 2018-09-26 12:15:08,497 WARN datanode.DataNode: Block pool BP-xxx (Datanode Uuid xxx) service to the-namenode.elephantland.gov/ 10.2.3.4:8020 is shutting down org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(0.0.0.0, datanodeUuid=xxx, infoPort=, infoSecurePort=, ipcPort=, storageInfo=lv=-56;cid=CID-1xxx;c=xxx) is attempting to report storage ID abc. Node 10.100.100.100:100 (actual ip addr) is expected to serve this storage. at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:483) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:3094) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:6406) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1200) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:215) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26632) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) at org.apache.hadoop.ipc.Client.call(Client.java:1451) at org.apache.hadoop.ipc.Client.call(Client.java:1382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.blockReceivedAndDeleted(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:240) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:289) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:692) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:834) at java.lang.Thread.run(Thread.java:745) The namenode is saying the incremental block report came with a DN registration containing address 0.0.0.0. This is what DN does on registration, not for block report. During registration, NN sends back with the address it saw and the DN uses the registration from the NN from that point on. So subsequent calls contain the address of its external interface. This exception trace suggests there is a bug in exception handling and re-registration.
        Hide
        kihwal Kihwal Lee added a comment -

        Daryn Sharp did further analysis:

        It's a bug during re-registration. The DN is supposed to create a registration object which contains the 0.0.0.0 addr, pass it to the NN which updates the addr and returns it, then the DN saves the updated registration for future calls.

        The problem is the DN saves off the initial registration with 0.0.0.0 before it receives the NN's response. When the DN encounters an exception contacting the NN, it is left with the invalid registration containing 0.0.0.0.

        The fix is not saving the registration until the NN updates it. There's a couple places where the DN isn't updating all references to a new registration.

        Show
        kihwal Kihwal Lee added a comment - Daryn Sharp did further analysis: It's a bug during re-registration. The DN is supposed to create a registration object which contains the 0.0.0.0 addr, pass it to the NN which updates the addr and returns it, then the DN saves the updated registration for future calls. The problem is the DN saves off the initial registration with 0.0.0.0 before it receives the NN's response. When the DN encounters an exception contacting the NN, it is left with the invalid registration containing 0.0.0.0. The fix is not saving the registration until the NN updates it. There's a couple places where the DN isn't updating all references to a new registration.
        Hide
        hitliuyi Yi Liu added a comment - - edited

        Yes, in the case of re-registration failure, xferAddr in bpRegistration is 0.0.0.0:xx, then datanode will get UnregisteredNodeException from NN while doing further incremental block report and heartbeat, and cause BP-xxx service shutdown. And we can see the exception log.

        The fix is not saving the registration until the NN updates it

        Agree.

        My comment is the change in BPOfferService#registrationSucceeded and DataNode#bpRegistrationSucceeded is necessary? Since re-registration failure will throw exception, and only successful registration will go to that logic and update the variables if they are not null. But I think it's also OK to update them every time when registration or re-registration.

        So +1 pending Jenkins.

        Show
        hitliuyi Yi Liu added a comment - - edited Yes, in the case of re-registration failure, xferAddr in bpRegistration is 0.0.0.0:xx, then datanode will get UnregisteredNodeException from NN while doing further incremental block report and heartbeat, and cause BP-xxx service shutdown. And we can see the exception log. The fix is not saving the registration until the NN updates it Agree. My comment is the change in BPOfferService#registrationSucceeded and DataNode#bpRegistrationSucceeded is necessary? Since re-registration failure will throw exception, and only successful registration will go to that logic and update the variables if they are not null. But I think it's also OK to update them every time when registration or re-registration. So +1 pending Jenkins.
        Hide
        hitliuyi Yi Liu added a comment -

        Submit patch to trigger Jenkins.

        Show
        hitliuyi Yi Liu added a comment - Submit patch to trigger Jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 17m 41s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 49s There were no new javac warning messages.
        +1 javadoc 10m 1s There were no new javadoc warning messages.
        +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 1m 22s The applied patch generated 1 new checkstyle issues (total was 188, now 188).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 27s mvn install still works.
        +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
        +1 findbugs 2m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 10s Pre-build of native portion
        -1 hdfs tests 161m 22s Tests failed in hadoop-hdfs.
            206m 15s  



        Reason Tests
        Failed unit tests hadoop.hdfs.server.namenode.TestEditLog
          hadoop.hdfs.web.TestWebHDFSOAuth2
          hadoop.hdfs.server.namenode.TestNameNodeMXBean



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12753300/HDFS-8995.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 7ad3556
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12228/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12228/artifact/patchprocess/testrun_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12228/testReport/
        Java 1.7.0_55
        uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12228/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 41s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 49s There were no new javac warning messages. +1 javadoc 10m 1s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 22s The applied patch generated 1 new checkstyle issues (total was 188, now 188). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 27s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 2m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 10s Pre-build of native portion -1 hdfs tests 161m 22s Tests failed in hadoop-hdfs.     206m 15s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.web.TestWebHDFSOAuth2   hadoop.hdfs.server.namenode.TestNameNodeMXBean Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12753300/HDFS-8995.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 7ad3556 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12228/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12228/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12228/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12228/console This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        Reran the failed test cases. They pass.

        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
        Running org.apache.hadoop.hdfs.server.namenode.TestEditLog
        Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 71.907 sec - in org.apache.hadoop.hdfs.server.namenode.TestEditLog
        Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
        Running org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean
        Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.427 sec - in org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean
        Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
        Running org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.744 sec - in org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2
        
        Results :
        
        Tests run: 30, Failures: 0, Errors: 0, Skipped: 0
        
        Show
        kihwal Kihwal Lee added a comment - Reran the failed test cases. They pass. ------------------------------------------------------- T E S T S ------------------------------------------------------- Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.TestEditLog Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 71.907 sec - in org.apache.hadoop.hdfs.server.namenode.TestEditLog Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.427 sec - in org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.744 sec - in org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2 Results : Tests run: 30, Failures: 0, Errors: 0, Skipped: 0
        Hide
        hitliuyi Yi Liu added a comment -

        +1, thanks Kihwal. Will commit it shortly.

        Show
        hitliuyi Yi Liu added a comment - +1, thanks Kihwal. Will commit it shortly.
        Hide
        hitliuyi Yi Liu added a comment -

        Committed to trunk, branch-2, branch-2.7.2.

        Show
        hitliuyi Yi Liu added a comment - Committed to trunk, branch-2, branch-2.7.2.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #8383 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8383/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #8383 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8383/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #339 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/339/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #339 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/339/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2280 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2280/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2280 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2280/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #331 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/331/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #331 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/331/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1066 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1066/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1066 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1066/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2261 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2261/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2261 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2261/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #322 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/322/)
        HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #322 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/322/ ) HDFS-8995 . Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu) (yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        sjlee0 Sangjin Lee added a comment -

        Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

        Show
        sjlee0 Sangjin Lee added a comment - Does this issue exist in 2.6.x? Should this be backported to branch-2.6?
        Hide
        djp Junping Du added a comment -

        Hi Yi Liu and Kihwal Lee, as Sangjin's comments earlier, shall we backport the fix to branch-2.6?

        Show
        djp Junping Du added a comment - Hi Yi Liu and Kihwal Lee , as Sangjin's comments earlier, shall we backport the fix to branch-2.6?
        Hide
        djp Junping Du added a comment -

        Move it out of 2.6.4 to 2.6.5 as haven't updated for a while.

        Show
        djp Junping Du added a comment - Move it out of 2.6.4 to 2.6.5 as haven't updated for a while.

          People

          • Assignee:
            kihwal Kihwal Lee
            Reporter:
            kihwal Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development