Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3332

NullPointerException in DN when directoryscanner is trying to report bad blocks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 2.0.0-alpha, 3.0.0
    • Component/s: datanode
    • Labels:
      None
    • Environment:

      HDFS

    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      There is 1 NN and 1 DN (NN is started with HA conf)
      I corrupted 1 block and found

      2012-04-27 09:59:01,214 INFO  datanode.DataNode (BPServiceActor.java:blockReport(401)) - BlockReport of 2 blocks took 0 msec to generate and 5 msecs for RPC and NN processing
      2012-04-27 09:59:01,214 INFO  datanode.DataNode (BPServiceActor.java:blockReport(420)) - sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3b756db3
      2012-04-27 09:59:01,726 INFO  datanode.DirectoryScanner (DirectoryScanner.java:scan(390)) - BlockPool BP-2087868617-10.18.40.95-1335500488012 Total blocks: 2, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:1
      2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl (FsDatasetImpl.java:checkAndUpdate(1366)) - Updating size of block -4466699320171028643 from 1024 to 1034
      2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl (FsDatasetImpl.java:checkAndUpdate(1374)) - Reporting the block blk_-4466699320171028643_1004 as corrupt due to length mismatch
      2012-04-27 09:59:01,728 DEBUG ipc.Client (Client.java:sendParam(807)) - IPC Client (1957050620) connection to /10.18.40.95:8020 from root sending #257
      2012-04-27 09:59:01,730 DEBUG ipc.Client (Client.java:receiveResponse(848)) - IPC Client (1957050620) connection to /10.18.40.95:8020 from root got value #257
      2012-04-27 09:59:01,730 DEBUG ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(193)) - Call: reportBadBlocks 2
      2012-04-27 09:59:01,731 ERROR datanode.DirectoryScanner (DirectoryScanner.java:run(288)) - Exception during DirectoryScanner execution - will continue next cycle
      java.lang.NullPointerException
      	at org.apache.hadoop.hdfs.protocol.DatanodeID.<init>(DatanodeID.java:66)
      	at org.apache.hadoop.hdfs.protocol.DatanodeInfo.<init>(DatanodeInfo.java:87)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportBadBlocks(BPServiceActor.java:238)
      	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.reportBadBlocks(BPOfferService.java:187)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:559)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1377)
      	at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:318)
      	at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:284)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:619)
      

      Here when Directory scanner is trying to report badblock we got a NPE.

        Activity

        Hide
        Tsz Wo Nicholas Sze added a comment -

        This probably related to the recent DatanodeID changes: HDFS-3164, HDFS-3138, HDFS-3171, HDFS-3144, HDFS-3216.

        Show
        Tsz Wo Nicholas Sze added a comment - This probably related to the recent DatanodeID changes: HDFS-3164 , HDFS-3138 , HDFS-3171 , HDFS-3144 , HDFS-3216 .
        Hide
        amith added a comment -

        Hi Nicholas,
        Please correct me if I am wrong

        I have NN started with HA configuration(nn1=40.95 and nn2=40.96 nn2 not started).

        I have started only 1 NN and made it as active, wrote a file and corrupted it manually.
        Directory scanner is reporting the bad block to all the NN via BPServiceActor.

        Here BPServiceActor#reportBadBlocks(ExtendedBlock block) will not check whether DN is correctly registered to NN.
        We are trying to report bad blocks using bpRegistration(which is null) causing NPE.

         void reportBadBlocks(ExtendedBlock block) {
            DatanodeInfo[] dnArr = { new DatanodeInfo(bpRegistration) };
            LocatedBlock[] blocks = { new LocatedBlock(block, dnArr) }; 
        

        Why bpRegistration is null?

        private void connectToNNAndHandshake() throws IOException {
            // get NN proxy
            bpNamenode = dn.connectToNN(nnAddr);
        
            // First phase of the handshake with NN - get the namespace
            // info.
            NamespaceInfo nsInfo = retrieveNamespaceInfo();
            
            // Verify that this matches the other NN in this HA pair.
            // This also initializes our block pool in the DN if we are
            // the first NN connection for this BP.
            bpos.verifyAndSetNamespaceInfo(nsInfo);
            
            // Second phase of the handshake with the NN.
            register();
          }
        

        Here in register() call bpRegistration is assigned. Since retrieveNamespaceInfo() is like a infinite loop trying to get the version

        NamespaceInfo retrieveNamespaceInfo() throws IOException {
            NamespaceInfo nsInfo = null;
            while (shouldRun()) {
              try {
                nsInfo = bpNamenode.versionRequest();
                LOG.debug(this + " received versionRequest response: " + nsInfo);
                break;
              } catch(SocketTimeoutException e) {  // namenode is busy
                LOG.warn("Problem connecting to server: " + nnAddr);
              } catch(IOException e ) {  // namenode is not available
                LOG.warn("Problem connecting to server: " + nnAddr);
              }
              
              // try again in a second
              sleepAndLogInterrupts(5000, "requesting version info from NN");
            }
            
            if (nsInfo != null) {
              checkNNVersion(nsInfo);
            } else {
              throw new IOException("DN shut down before block pool connected");
            }
            return nsInfo;
          }
        

        so bpRegistration is not assigned.

        Show
        amith added a comment - Hi Nicholas, Please correct me if I am wrong I have NN started with HA configuration(nn1=40.95 and nn2=40.96 nn2 not started). I have started only 1 NN and made it as active, wrote a file and corrupted it manually. Directory scanner is reporting the bad block to all the NN via BPServiceActor. Here BPServiceActor#reportBadBlocks(ExtendedBlock block) will not check whether DN is correctly registered to NN. We are trying to report bad blocks using bpRegistration(which is null) causing NPE. void reportBadBlocks(ExtendedBlock block) { DatanodeInfo[] dnArr = { new DatanodeInfo(bpRegistration) }; LocatedBlock[] blocks = { new LocatedBlock(block, dnArr) }; Why bpRegistration is null? private void connectToNNAndHandshake() throws IOException { // get NN proxy bpNamenode = dn.connectToNN(nnAddr); // First phase of the handshake with NN - get the namespace // info. NamespaceInfo nsInfo = retrieveNamespaceInfo(); // Verify that this matches the other NN in this HA pair. // This also initializes our block pool in the DN if we are // the first NN connection for this BP. bpos.verifyAndSetNamespaceInfo(nsInfo); // Second phase of the handshake with the NN. register(); } Here in register() call bpRegistration is assigned. Since retrieveNamespaceInfo() is like a infinite loop trying to get the version NamespaceInfo retrieveNamespaceInfo() throws IOException { NamespaceInfo nsInfo = null ; while (shouldRun()) { try { nsInfo = bpNamenode.versionRequest(); LOG.debug( this + " received versionRequest response: " + nsInfo); break ; } catch (SocketTimeoutException e) { // namenode is busy LOG.warn( "Problem connecting to server: " + nnAddr); } catch (IOException e ) { // namenode is not available LOG.warn( "Problem connecting to server: " + nnAddr); } // try again in a second sleepAndLogInterrupts(5000, "requesting version info from NN" ); } if (nsInfo != null ) { checkNNVersion(nsInfo); } else { throw new IOException( "DN shut down before block pool connected" ); } return nsInfo; } so bpRegistration is not assigned.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Amith, do you mean that you configured both active and standby NNs but only started the active NN?

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Amith, do you mean that you configured both active and standby NNs but only started the active NN?
        Hide
        amith added a comment -

        Yes Nicholas, I have configured 2 NN but started only 1 as active

        Show
        amith added a comment - Yes Nicholas, I have configured 2 NN but started only 1 as active
        Hide
        Tsz Wo Nicholas Sze added a comment -

        So, the problem is that the DN has not registered with standby NN but tries to report bad blocks. We probably should check if the registration is null before reporting the bad blocks.

        Show
        Tsz Wo Nicholas Sze added a comment - So, the problem is that the DN has not registered with standby NN but tries to report bad blocks. We probably should check if the registration is null before reporting the bad blocks.
        Hide
        amith added a comment -

        Yes Nicholas
        this is a simple defect probably will gave a patch today

        Show
        amith added a comment - Yes Nicholas this is a simple defect probably will gave a patch today
        Hide
        amith added a comment -

        Since the exception thrown is internally caught I couldn't right a good test case which can assert the functionality correctly.

        Show
        amith added a comment - Since the exception thrown is internally caught I couldn't right a good test case which can assert the functionality correctly.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good.

        I agree that it is not easy to add a new test and the change is straightforward. Please manually test it. Thanks, amith.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good. I agree that it is not easy to add a new test and the change is straightforward. Please manually test it. Thanks, amith.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525315/HDFS-3332.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525315/HDFS-3332.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2361//console This message is automatically generated.
        Hide
        amith added a comment -

        I manually tested patch, it works fine.
        Findbugs link is not opening in order to clear it

        Show
        amith added a comment - I manually tested patch, it works fine. Findbugs link is not opening in order to clear it
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The findbugs warning is not related to this.

        Show
        Tsz Wo Nicholas Sze added a comment - The findbugs warning is not related to this.
        Hide
        Uma Maheswara Rao G added a comment -

        Yes, findbug is related to HDFS-3350.
        Thanks for the patch Amith.
        Thanks for the review Nicholas. I will commit this patch.

        Show
        Uma Maheswara Rao G added a comment - Yes, findbug is related to HDFS-3350 . Thanks for the patch Amith. Thanks for the review Nicholas. I will commit this patch.
        Hide
        amith added a comment -

        Thanks both of u

        Show
        amith added a comment - Thanks both of u
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2255 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2255/)
        HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587)

        Result = SUCCESS
        umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2255 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2255/ ) HDFS-3332 . NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2181 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2181/)
        HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587)

        Result = SUCCESS
        umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2181 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2181/ ) HDFS-3332 . NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        Uma Maheswara Rao G added a comment -

        I have just committed this to trunk and branch-2.

        Show
        Uma Maheswara Rao G added a comment - I have just committed this to trunk and branch-2.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2199 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2199/)
        HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587)

        Result = ABORTED
        umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2199 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2199/ ) HDFS-3332 . NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1034 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1034/)
        HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587)

        Result = FAILURE
        umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1034 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1034/ ) HDFS-3332 . NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587) Result = FAILURE umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1069 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1069/)
        HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587)

        Result = SUCCESS
        umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1069 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1069/ ) HDFS-3332 . NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. (Revision 1333587) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1333587 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java

          People

          • Assignee:
            amith
            Reporter:
            amith
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development