Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3860

HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 0.23.4, 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the monitor thread will acquire the write lock of namesystem, and recheck the safemode. If it is in safemode, the monitor thread will return from the heartbeatCheck function without release the write lock. This may cause the monitor thread wrongly holding the write lock forever.

      The attached test case tries to simulate this bad scenario.

      1. HDFS-3860.patch
        0.9 kB
        Jing Zhao
      2. HDFS-heartbeat-testcase.patch
        5 kB
        Jing Zhao

        Activity

        Hide
        Suresh Srinivas added a comment -

        Jing, nice find. Submitting the patch.

        Show
        Suresh Srinivas added a comment - Jing, nice find. Submitting the patch.
        Hide
        Suresh Srinivas added a comment -

        BTW could you please also ensure that this pattern of code is not repeated in any other places.

        Show
        Suresh Srinivas added a comment - BTW could you please also ensure that this pattern of code is not repeated in any other places.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestHftpDelegationToken

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        Oof, good catch, Jing. Fortunately this case seems like it would be pretty tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck will return early, so to hit this the NN would have to enter SM in a very short window of time. Still certainly worth fixing, though.

        The patch looks good to me. The findbugs warning is unrelated and TestHftpDelegationToken is known to currently be failing.

        +1, I'll commit this momentarily.

        Show
        Aaron T. Myers added a comment - Oof, good catch, Jing. Fortunately this case seems like it would be pretty tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck will return early, so to hit this the NN would have to enter SM in a very short window of time. Still certainly worth fixing, though. The patch looks good to me. The findbugs warning is unrelated and TestHftpDelegationToken is known to currently be failing. +1, I'll commit this momentarily.
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to trunk and branch-2.

        Thanks a lot for the contribution, Jing.

        Show
        Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Jing.
        Hide
        Suresh Srinivas added a comment -

        Thanks Aaron for committing the patch.

        BTW could you please also ensure that this pattern of code is not repeated in any other places.

        Going back to my previous comment, Jing, if possible can you also see if there other such issues.

        Show
        Suresh Srinivas added a comment - Thanks Aaron for committing the patch. BTW could you please also ensure that this pattern of code is not repeated in any other places. Going back to my previous comment, Jing, if possible can you also see if there other such issues.
        Hide
        Jing Zhao added a comment -

        I just checked all the invocation of namesystem#writelock / writeunlock, and did not find similar problems. I will check other similar code too.

        Show
        Jing Zhao added a comment - I just checked all the invocation of namesystem#writelock / writeunlock, and did not find similar problems. I will check other similar code too.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/)
        HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/ ) HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2651 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/)
        HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2651 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/ ) HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/)
        HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/ ) HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1149 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1149/)
        HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1149 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1149/ ) HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1180 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1180/)
        HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1180 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1180/ ) HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1378228) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Hide
        Robert Joseph Evans added a comment -

        I pulled this into branch-0.23 too

        Show
        Robert Joseph Evans added a comment - I pulled this into branch-0.23 too
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #387 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/387/)
        svn merge -c 1378228 FIXES: HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1390632)

        Result = UNSTABLE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390632
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #387 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/387/ ) svn merge -c 1378228 FIXES: HDFS-3860 . HeartbeatManager#Monitor may wrongly hold the writelock of namesystem. Contributed by Jing Zhao. (Revision 1390632) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390632 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java

          People

          • Assignee:
            Jing Zhao
            Reporter:
            Jing Zhao
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development