Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1250

Namenode accepts block report from dead datanodes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When a datanode heartbeat times out namenode marks it dead. The subsequent heartbeat from the datanode is rejected with a command to datanode to re-register. However namenode accepts block report from the datanode although it is marked dead.

      1. HDFS-1250.patch
        8 kB
        Suresh Srinivas
      2. HDFS-1250.1.patch
        8 kB
        Suresh Srinivas
      3. HDFS-1250.y20.patch
        7 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          This creates several problems:

          1. Blocks from dead datanode is added to block map. Opening a file returns blocks from the dead datanode. The client experience failure to read it repeatedly.
          2. Decommissioning the datanode does not resolve this problem as the blocks continue to remain in the block map. Only way to solve this problem is by restarting namenode.

          This problem was observed on a yahoo cluster, where a datanode was marked dead, however a block report from it made it to namenode.

          Show
          Suresh Srinivas added a comment - This creates several problems: Blocks from dead datanode is added to block map. Opening a file returns blocks from the dead datanode. The client experience failure to read it repeatedly. Decommissioning the datanode does not resolve this problem as the blocks continue to remain in the block map. Only way to solve this problem is by restarting namenode. This problem was observed on a yahoo cluster, where a datanode was marked dead, however a block report from it made it to namenode.
          Hide
          Suresh Srinivas added a comment -

          Hudson is stuck. I ran testpatch; here is the result:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no new tests are needed for this patch.
          [exec] Also please list what manual steps were performed to verify this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          This change just tags the code with interface classification. Hence tests are not included. I also ran unit tests and it ran without failures.

          Show
          Suresh Srinivas added a comment - Hudson is stuck. I ran testpatch; here is the result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. This change just tags the code with interface classification. Hence tests are not included. I also ran unit tests and it ran without failures.
          Hide
          Suresh Srinivas added a comment -

          Ignore the previous comment. Posted it to wrong jira.

          Show
          Suresh Srinivas added a comment - Ignore the previous comment. Posted it to wrong jira.
          Hide
          Suresh Srinivas added a comment -

          The attached patch makes the following changes:

          1. Currently when an unknown datanode sends blockReport(), namenode rejects it with an IOException. Same is done when a dead datanode sends block report.
          2. Currently when an unknown datanode sends blockReceived() request, namenode rejects it with IllegalArgumentException. I am changing this to IOException, to be consistent with blockReport(). Same IOException is thrown when a dead datanode sends blockReceived(), IOException.
          3. I have added a new test to ensure the following requests are rejected from dead datanodes:
            • blockReceived()
            • blockReport()
            • sendHeartBeat()
          Show
          Suresh Srinivas added a comment - The attached patch makes the following changes: Currently when an unknown datanode sends blockReport(), namenode rejects it with an IOException. Same is done when a dead datanode sends block report. Currently when an unknown datanode sends blockReceived() request, namenode rejects it with IllegalArgumentException. I am changing this to IOException, to be consistent with blockReport(). Same IOException is thrown when a dead datanode sends blockReceived(), IOException. I have added a new test to ensure the following requests are rejected from dead datanodes: blockReceived() blockReport() sendHeartBeat()
          Hide
          Konstantin Shvachko added a comment -
          1. Some more candidates, could you please verify and add isAlive() condition if necessary:
            • commitBlockSynchronization()
            • updatePipeline()
          2. You used "dead or unknown" twice and "unknown or dead" in one other place. Would be better to have it consistent.
          Show
          Konstantin Shvachko added a comment - Some more candidates, could you please verify and add isAlive() condition if necessary: commitBlockSynchronization() updatePipeline() You used "dead or unknown" twice and "unknown or dead" in one other place. Would be better to have it consistent.
          Hide
          Suresh Srinivas added a comment -

          updatePipeLine() is not a DatanodeProtocol method.

          We need to add the same checks commitBlockSynchronization(), reportBadBlocks() methods. Currently these methods do not have DatanodeRegistration parameter in them. Without that checking if the node is registered or alive is not possible. I will create a separate jira to track this.

          Note that the check to see if the node is registered/alive is not done for the method errorReport() as it is done for logging an error.

          Show
          Suresh Srinivas added a comment - updatePipeLine() is not a DatanodeProtocol method. We need to add the same checks commitBlockSynchronization(), reportBadBlocks() methods. Currently these methods do not have DatanodeRegistration parameter in them. Without that checking if the node is registered or alive is not possible. I will create a separate jira to track this. Note that the check to see if the node is registered/alive is not done for the method errorReport() as it is done for logging an error.
          Hide
          Suresh Srinivas added a comment -

          Patch changes exceptions and log messages to consistently use "unregistered or dead nodes" in text.

          Show
          Suresh Srinivas added a comment - Patch changes exceptions and log messages to consistently use "unregistered or dead nodes" in text.
          Hide
          Suresh Srinivas added a comment -

          HDFS-1270 tracks adding to DatanodeProtocol methods, DatanodeRegistration parameter.

          Show
          Suresh Srinivas added a comment - HDFS-1270 tracks adding to DatanodeProtocol methods, DatanodeRegistration parameter.
          Hide
          Konstantin Shvachko added a comment -

          Yes I agree commitBlockSynchronization(), and reportBadBlocks() should have registration parameter and check isAlive(), to be fixed in HDFS-1270.

          +1 for the patch.

          Show
          Konstantin Shvachko added a comment - Yes I agree commitBlockSynchronization(), and reportBadBlocks() should have registration parameter and check isAlive(), to be fixed in HDFS-1270 . +1 for the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448107/HDFS-1250.1.patch
          against trunk revision 957669.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448107/HDFS-1250.1.patch against trunk revision 957669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Failed tests TestBlockToken and TestJspHelper are unrelated to this patch.

          Show
          Suresh Srinivas added a comment - Failed tests TestBlockToken and TestJspHelper are unrelated to this patch.
          Hide
          Suresh Srinivas added a comment -

          y20 version of the patch

          Show
          Suresh Srinivas added a comment - y20 version of the patch
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to trunk.

          Show
          Suresh Srinivas added a comment - I committed the patch to trunk.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #325 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/325/)
          HDFS-1250. Namenode should reject block reports and block received requests from dead datanodes. Contributed by Suresh Srinivas.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #325 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/325/ ) HDFS-1250 . Namenode should reject block reports and block received requests from dead datanodes. Contributed by Suresh Srinivas.
          Hide
          Konstantin Shvachko added a comment -

          +1 for the y20 version.

          Show
          Konstantin Shvachko added a comment - +1 for the y20 version.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development