Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7533

Datanode sometimes does not shutdown on receiving upgrade shutdown command

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When datanode is told to shutdown via the dfsadmin command during rolling upgrade, it may not shutdown. This is because not all writers have responder running, but sendOOB() tries anyway. This causes NPE and the shutdown thread dies, halting the shutdown after only shutting down DataXceiverServer.

      1. HDFS-7533-branch-2.6-v1.patch
        3 kB
        Chris Trezzo
      2. HDFS-7533.v1.txt
        3 kB
        Eric Payne

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          Here is the stack trace.

          Exception in thread "Thread-153358" java.lang.NullPointerException
          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.sendOOB(BlockReceiver.java:753)
          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendOOB(DataXceiver.java:167)
          at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.sendOOBToPeers(DataXceiverServer.java:241)
          at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1605)
          at org.apache.hadoop.hdfs.server.datanode.DataNode$4.run(DataNode.java:2885)

          Show
          kihwal Kihwal Lee added a comment - Here is the stack trace. Exception in thread "Thread-153358" java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.sendOOB(BlockReceiver.java:753) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendOOB(DataXceiver.java:167) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.sendOOBToPeers(DataXceiverServer.java:241) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1605) at org.apache.hadoop.hdfs.server.datanode.DataNode$4.run(DataNode.java:2885)
          Hide
          kihwal Kihwal Lee added a comment -

          We can check whether a responder is running, but it may be in the process of shutting down. Therefore, a proper check requires additional locking. Alternatively, we can simply catch any Throwable and ignore, so that the shutdown thread can complete the rest of the process. Since the out of band messaging is advisory, this is acceptable, IMO.

          Show
          kihwal Kihwal Lee added a comment - We can check whether a responder is running, but it may be in the process of shutting down. Therefore, a proper check requires additional locking. Alternatively, we can simply catch any Throwable and ignore, so that the shutdown thread can complete the rest of the process. Since the out of band messaging is advisory, this is acceptable, IMO.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12691530/HDFS-7533.v1.txt
          against trunk revision ef3c3a8.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9181//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9181//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691530/HDFS-7533.v1.txt against trunk revision ef3c3a8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9181//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9181//console This message is automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          The "-1 overall" is unrelated, see HADOOP-11473.

          Show
          jlowe Jason Lowe added a comment - The "-1 overall" is unrelated, see HADOOP-11473 .
          Hide
          kihwal Kihwal Lee added a comment -

          +1 The patch looks good.

          Show
          kihwal Kihwal Lee added a comment - +1 The patch looks good.
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for fixing this, Eric.

          Show
          kihwal Kihwal Lee added a comment - Thanks for fixing this, Eric.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6845 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6845/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6845 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6845/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #72 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/72/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/806/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #806 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/806/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #69 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/69/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2004 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2004/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #73 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/73/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/)
          HDFS-7533. Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2023 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2023/ ) HDFS-7533 . Datanode sometimes does not shutdown on receiving upgrade shutdown command. Contributed by Eric Payne. (kihwal: rev 6bbf9fdd041d2413dd78e2bce51abae15f3334c2) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeExit.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Hide
          ctrezzo Chris Trezzo added a comment -

          Attached is a patch for branch 2.6. This was a trivial backport. I ran TestDataNodeExit and it passed.

          Show
          ctrezzo Chris Trezzo added a comment - Attached is a patch for branch 2.6. This was a trivial backport. I ran TestDataNodeExit and it passed.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Sangjin Lee backported this to 2.6.1. I just pushed the commit to 2.6.1 after running compilation and TestDataNodeExit which changed in the patch.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Sangjin Lee backported this to 2.6.1. I just pushed the commit to 2.6.1 after running compilation and TestDataNodeExit which changed in the patch.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Kihwal Lee Eric Payne ,
          thanks for the effort fixing this issue. I just started Hadoop development recently and I am reviewing this issue. It looks to me that catching Throwable and ignore it may cause other issues, for example, ignoring a OOME can be pretty bad. How able adding a warning message after catching Throwable? It would be great if you could elaborate on this.

          Thank you

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Kihwal Lee Eric Payne , thanks for the effort fixing this issue. I just started Hadoop development recently and I am reviewing this issue. It looks to me that catching Throwable and ignore it may cause other issues, for example, ignoring a OOME can be pretty bad. How able adding a warning message after catching Throwable? It would be great if you could elaborate on this. Thank you
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Kihwal Lee and Eric Payne,

          Thanks for your earlier work here.

          It seems helpful to add a WARN message with e's info when Throwable is caught below. Would you please comment? Thanks much.

          +      try {
          +        xserver.sendOOBToPeers();
          +        ((DataXceiverServer) this.dataXceiverServer.getRunnable()).kill();
          +        this.dataXceiverServer.interrupt();
          +      } catch (Throwable e) {
          +        // Ignore, since the out of band messaging is advisory.
          +      }
          
          Show
          yzhangal Yongjun Zhang added a comment - Hi Kihwal Lee and Eric Payne , Thanks for your earlier work here. It seems helpful to add a WARN message with e's info when Throwable is caught below. Would you please comment? Thanks much. + try { + xserver.sendOOBToPeers(); + ((DataXceiverServer) this .dataXceiverServer.getRunnable()).kill(); + this .dataXceiverServer.interrupt(); + } catch (Throwable e) { + // Ignore, since the out of band messaging is advisory. + }
          Hide
          yzhangal Yongjun Zhang added a comment -

          FYI Kihwal Lee and Eric Payne,

          Wei-Chiu Chuang created HDFS-9181, w.r.t the question I asked above. Appreciate if you could comment in that jira. Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - FYI Kihwal Lee and Eric Payne , Wei-Chiu Chuang created HDFS-9181 , w.r.t the question I asked above. Appreciate if you could comment in that jira. Thanks.

            People

            • Assignee:
              eepayne Eric Payne
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development