Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2359

NPE found in Datanode log while Disk failed during different HDFS operation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: 0.20.205.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Scenario:
      I have a cluster of 4 DN ,each of them have 12disks.

      In hdfs-site.xml I have "dfs.datanode.failed.volumes.tolerated=3"

      During the execution of distcp (hdfs->hdfs), I am failing 3 disks in one Datanode, by making Data Directory permission 000, The distcp job is successful but , I am getting some NullPointerException in Datanode log

      In one thread
      $hadoop distcp /user/$HADOOPQA_USER/data1 /user/$HADOOPQA_USER/data3

      In another thread in a datanode
      $ chmod 000 /xyz/

      {0,1,2}

      /hadoop/var/hdfs/data

      where [ dfs.data.dir is set as /xyz/

      {0..11}

      /hadoop/var/hdfs/data ]

      Log Snippet from the Datanode
      =============

      2011-09-19 12:43:40,314 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block
      blk_7065198814142552283_62557. BlockInfo not found in volumeMap.
      2011-09-19 12:43:40,314 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block
      blk_7066946313092770579_39189. BlockInfo not found in volumeMap.
      2011-09-19 12:43:40,314 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block
      blk_7070305189404753930_49359. BlockInfo not found in volumeMap.
      2011-09-19 12:43:40,327 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command
      java.io.IOException: Error in deleting blocks.
      at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1820)
      at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1074)
      at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1036)
      at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:891)
      at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1419)
      at java.lang.Thread.run(Thread.java:619)
      2011-09-19 12:43:41,304 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
      DatanodeRegistration(xx.xxx.xxx.xxx:xxxx, storageID=xx-xxxxxxxxxxxx-xx.xxx.xxx.xxx-xxxx-xxxxxxxxxxx, infoPort=1006,
      ipcPort=8020):DataXceiver
      java.lang.NullPointerException
      at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner$LogFileHandler.appendLine(DataBlockScanner.java:788)
      at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.updateScanStatusInternal(DataBlockScanner.java:365)
      at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifiedByClient(DataBlockScanner.java:308)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:205)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
      at java.lang.Thread.run(Thread.java:619)
      2011-09-19 12:43:43,313 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block
      blk_7071818644980664768_40827. BlockInfo not found in volumeMap.
      2011-09-19 12:43:43,313 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete block
      blk_7073840977856837621_62108. BlockInfo not found in volumeMap.

        Activity

        Hide
        Aaron T. Myers added a comment -

        Looks like the description of this JIRA should say "NameNode", not "DataNode", right?

        Show
        Aaron T. Myers added a comment - Looks like the description of this JIRA should say "NameNode", not "DataNode", right?
        Hide
        Rajit Saha added a comment -

        Thanks Aaron, its Datanode , my bad

        Show
        Rajit Saha added a comment - Thanks Aaron, its Datanode , my bad
        Hide
        Rajit Saha added a comment -

        Also I have seen , this NullPointer exception can happen any of the scenarios,
        like dfs -copyFromLocal/-put/-cp/-rm or distcp. Its not definitive. I have seen
        this in many times in different occasions. But everytime it happens when I
        try to simulate Disk Fail by changing permission of Data dir as 000

        Show
        Rajit Saha added a comment - Also I have seen , this NullPointer exception can happen any of the scenarios, like dfs -copyFromLocal/-put/-cp/-rm or distcp. Its not definitive. I have seen this in many times in different occasions. But everytime it happens when I try to simulate Disk Fail by changing permission of Data dir as 000
        Hide
        Suresh Srinivas added a comment -

        Rajit, what release is this happening. Could you please set the Affects Version/s with release where you found this problem.

        Show
        Suresh Srinivas added a comment - Rajit, what release is this happening. Could you please set the Affects Version/s with release where you found this problem.
        Hide
        Rajit Saha added a comment -

        Suresh, thanks for correcting. I am seeing this in .20.205 unreleased version

        Show
        Rajit Saha added a comment - Suresh, thanks for correcting. I am seeing this in .20.205 unreleased version
        Hide
        Jonathan Eagles added a comment -

        This NPE was fixed in trunk as part of HDFS-1655. Cherry picking the NPE fix to create the patch for this JIRA.

        Show
        Jonathan Eagles added a comment - This NPE was fixed in trunk as part of HDFS-1655 . Cherry picking the NPE fix to create the patch for this JIRA.
        Hide
        Jonathan Eagles added a comment -

        No tests were added due to the difficulty of testing a private method of a static inner class.

        Show
        Jonathan Eagles added a comment - No tests were added due to the difficulty of testing a private method of a static inner class.
        Hide
        Jonathan Eagles added a comment -

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no tests are needed for this patch.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        Show
        Jonathan Eagles added a comment - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
        Hide
        Jonathan Eagles added a comment -

        See above comment on why no tests were provided. This patch should go into branch-0.20-security and branch-0.20-security-205.

        Show
        Jonathan Eagles added a comment - See above comment on why no tests were provided. This patch should go into branch-0.20-security and branch-0.20-security-205.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12496504/HDFS-2359-branch-0.20-security.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1290//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12496504/HDFS-2359-branch-0.20-security.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1290//console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch.

        Show
        Suresh Srinivas added a comment - +1 for the patch.
        Hide
        Suresh Srinivas added a comment -

        I have committed this patch. Thank you Jonathan for the patch.

        Show
        Suresh Srinivas added a comment - I have committed this patch. Thank you Jonathan for the patch.
        Hide
        Jonathan Eagles added a comment -

        Thanks, Suresh!

        Show
        Jonathan Eagles added a comment - Thanks, Suresh!
        Hide
        Matt Foley added a comment -

        Closed upon release of 0.20.205.0

        Show
        Matt Foley added a comment - Closed upon release of 0.20.205.0

          People

          • Assignee:
            Jonathan Eagles
            Reporter:
            Rajit Saha
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development