Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time.

      It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode.

      1. deadnodescheck.patch
        0.6 kB
        Hairong Kuang
      2. deadnodescheck1_0.20-security.patch
        0.6 kB
        Matt Foley
      3. deadnodescheck1.patch
        0.6 kB
        Hairong Kuang

        Activity

        Hide
        dhruba borthakur added a comment -

        Also, there is no advantage to marking datanodes as dead when the namenode is in safemode. The Nn anyway does not replicate blocks when it is in safemode. It makes lots of sense to not mark datanodes as dead when the namenode is in safemode.

        Show
        dhruba borthakur added a comment - Also, there is no advantage to marking datanodes as dead when the namenode is in safemode. The Nn anyway does not replicate blocks when it is in safemode. It makes lots of sense to not mark datanodes as dead when the namenode is in safemode.
        Hide
        Hairong Kuang added a comment -

        This patch makes sure that NameNode does not check dead nodes before the under-replication queue is populated.

        Show
        Hairong Kuang added a comment - This patch makes sure that NameNode does not check dead nodes before the under-replication queue is populated.
        Hide
        dhruba borthakur added a comment -

        +1 code looks good. I think we do not need a unit test for this one.

        Show
        dhruba borthakur added a comment - +1 code looks good. I think we do not need a unit test for this one.
        Hide
        jinglong.liujl added a comment -

        I think ,this way will hide the root cause. Because, in large cluster, even we live out safemode, We'll meet the same issue which cause by concurrent block report.
        The root cause is currently heartbeat is too heavy waight, too many things have been done in it, such as block report / block receive/ task assign. To avoid one of them blocked other datanode heartbeat,we can use another rpc (lightweight heartbeat) to keep alive. In this heartbeat, it only update time stamp to avoid lost datanode and should not require FSNamesystem lock.

        Show
        jinglong.liujl added a comment - I think ,this way will hide the root cause. Because, in large cluster, even we live out safemode, We'll meet the same issue which cause by concurrent block report. The root cause is currently heartbeat is too heavy waight, too many things have been done in it, such as block report / block receive/ task assign. To avoid one of them blocked other datanode heartbeat,we can use another rpc (lightweight heartbeat) to keep alive. In this heartbeat, it only update time stamp to avoid lost datanode and should not require FSNamesystem lock.
        Hide
        dhruba borthakur added a comment -

        I think this patch is good to go. jinglong.liujl's comment is good,but we can address "lightweight heartbeat" in a separate jira perhaps?

        Show
        dhruba borthakur added a comment - I think this patch is good to go. jinglong.liujl's comment is good,but we can address "lightweight heartbeat" in a separate jira perhaps?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12467496/deadnodescheck.patch
        against trunk revision 1072023.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467496/deadnodescheck.patch against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/180//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12467496/deadnodescheck.patch
        against trunk revision 1074282.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467496/deadnodescheck.patch against trunk revision 1074282. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/216//console This message is automatically generated.
        Hide
        Matt Foley added a comment -

        Hi Hairong,
        This patch uses isPopulatingReplQueues() as a proxy for being in startup mode. Is it best to start de-registering datanodes as soon as the repl queues are started, or should it wait until the system actually leaves safe mode?

        If the issue is that it should wait until leaving safe mode, but only when it is in "startup safemode", that relates to HDFS-1726.
        Thanks.

        Show
        Matt Foley added a comment - Hi Hairong, This patch uses isPopulatingReplQueues() as a proxy for being in startup mode. Is it best to start de-registering datanodes as soon as the repl queues are started, or should it wait until the system actually leaves safe mode? If the issue is that it should wait until leaving safe mode, but only when it is in "startup safemode", that relates to HDFS-1726 . Thanks.
        Hide
        dhruba borthakur added a comment -

        I am thinking that the namenode should not mark datanodes as dead if the namenode is in safemode, irrespective of whether it is in startup-safemode or in manual-safemode. My reasoning is as follows:

        A couple of times, we have had failures of a few set of racks. when this happened, we put the namenode in safemode to prevent a replication storm. When the namenode loses a large chunk of datanodes, it has to spend lots of cpu resources in processing blockreports when the partitioned datanodes start rejoining the cluster; at this time it is better if we can prevent the datanodes from timing out, or else the storm of block reports causes other datanodes to timeout resulting in a never-ending cycle.

        Show
        dhruba borthakur added a comment - I am thinking that the namenode should not mark datanodes as dead if the namenode is in safemode, irrespective of whether it is in startup-safemode or in manual-safemode. My reasoning is as follows: A couple of times, we have had failures of a few set of racks. when this happened, we put the namenode in safemode to prevent a replication storm. When the namenode loses a large chunk of datanodes, it has to spend lots of cpu resources in processing blockreports when the partitioned datanodes start rejoining the cluster; at this time it is better if we can prevent the datanodes from timing out, or else the storm of block reports causes other datanodes to timeout resulting in a never-ending cycle.
        Hide
        Matt Foley added a comment -

        A couple more reasons for using "in safemode" rather than isPopulatingReplQueues():

        • The patch for Replication Queues to start populating before leaving safemode (HDFS-1476), competes with Block Reports for the global lock. This can delay the later block reports by several minutes, making the dead node problem even worse.
        • A very small number of datanodes marked dead, relatively speaking, can prevent the cluster from ever coming out of safemode automatically, depending on how random the replica distribution is. In our shop, automatic exit is highly desired.
        Show
        Matt Foley added a comment - A couple more reasons for using "in safemode" rather than isPopulatingReplQueues(): The patch for Replication Queues to start populating before leaving safemode ( HDFS-1476 ), competes with Block Reports for the global lock. This can delay the later block reports by several minutes, making the dead node problem even worse. A very small number of datanodes marked dead, relatively speaking, can prevent the cluster from ever coming out of safemode automatically, depending on how random the replica distribution is. In our shop, automatic exit is highly desired.
        Hide
        Hairong Kuang added a comment -

        When replication queue starts to populate (if we set the threshold to be small enough), the traffic on block reports dramatically slows down. That's why I think it is unlikely for NN to make a wrong decision on dead nodes. Using the time in between block replication time and safemode exit, NN might be able to catch those datanodes that are really dead in safemode.

        But if everybody thinks we should use safemode as the guard instead, I see the benefit too and I am not against it. Let me upload a new patch.

        Show
        Hairong Kuang added a comment - When replication queue starts to populate (if we set the threshold to be small enough), the traffic on block reports dramatically slows down. That's why I think it is unlikely for NN to make a wrong decision on dead nodes. Using the time in between block replication time and safemode exit, NN might be able to catch those datanodes that are really dead in safemode. But if everybody thinks we should use safemode as the guard instead, I see the benefit too and I am not against it. Let me upload a new patch.
        Hide
        Matt Foley added a comment -

        Great! Thanks, Hairong. This is key for making startup faster for big clusters.

        Show
        Matt Foley added a comment - Great! Thanks, Hairong. This is key for making startup faster for big clusters.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12473959/deadnodescheck1.patch
        against trunk revision 1082263.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestDecommission
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12473959/deadnodescheck1.patch against trunk revision 1082263. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/269//console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch.

        Minor: I was wondering if we should add more comments on why this being done in the code.

        Show
        Suresh Srinivas added a comment - +1 for the patch. Minor: I was wondering if we should add more comments on why this being done in the code.
        Hide
        Hairong Kuang added a comment -

        I just committed this!

        Show
        Hairong Kuang added a comment - I just committed this!
        Hide
        Matt Foley added a comment -

        Attaching the identical patch for hadoop-0.20-security. The only difference is the file name and a 56 line offset to accommodate the different code base.

        I'm not changing the workflow state because test-patch can't run on non-trunk anyway.

        Please review and commit to hadoop-0.20-security.

        Show
        Matt Foley added a comment - Attaching the identical patch for hadoop-0.20-security. The only difference is the file name and a 56 line offset to accommodate the different code base. I'm not changing the workflow state because test-patch can't run on non-trunk anyway. Please review and commit to hadoop-0.20-security.
        Hide
        Suresh Srinivas added a comment -

        I committed the patch to branch-0.20-security

        Show
        Suresh Srinivas added a comment - I committed the patch to branch-0.20-security
        Hide
        Suresh Srinivas added a comment -

        It is worth porting this to 0.22 release as well. Comments?

        Show
        Suresh Srinivas added a comment - It is worth porting this to 0.22 release as well. Comments?
        Hide
        Konstantin Shvachko added a comment -

        Yes, it would be good to have it in 0.22, since it's been committed to 0.20-sec

        Show
        Konstantin Shvachko added a comment - Yes, it would be good to have it in 0.22, since it's been committed to 0.20-sec
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #582 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/582/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #582 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/582/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was released.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development