Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1502

TestBlockRecovery triggers NPE in assert

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Testcase: testRBW_RWRReplicas took 10.333 sec
              Caused an ERROR
      null
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1881)
              at org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.testSyncReplicas(TestBlockRecovery.java:144)
              at org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.testRBW_RWRReplicas(TestBlockRecovery.java:305)
      
              Block reply = r.datanode.updateReplicaUnderRecovery(
                  r.rInfo, recoveryId, newBlock.getNumBytes());
              assert reply.equals(newBlock) &&
                     reply.getNumBytes() == newBlock.getNumBytes() :
                "Updated replica must be the same as the new block.";    <----- line 1881
      

      Not sure how reply could be null since updateReplicaUnderRecovery always returns a newly instantiated object.

      1. HDFS-1502.patch
        2 kB
        Konstantin Boudnik
      2. fixTestBlockRecovery.patch
        5 kB
        Hairong Kuang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Hide
        Konstantin Boudnik added a comment -

        This fix has been mistakenly merged into branch-0.21. It has been reversed.

        Show
        Konstantin Boudnik added a comment - This fix has been mistakenly merged into branch-0.21 . It has been reversed.
        Hide
        Konstantin Boudnik added a comment -

        I have just committed it. Thanks Hairong!

        Show
        Konstantin Boudnik added a comment - I have just committed it. Thanks Hairong!
        Hide
        Konstantin Boudnik added a comment -
        -1 overall.  
        
            +1 @author.  The patch does not contain any @author tags.
        
            +1 tests included.  The patch appears to include 3 new or modified tests.
        
            +1 javadoc.  The javadoc tool did not generate any warning messages.
        
            +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
        
            +1 findbugs.  The patch does not introduce any new Findbugs warnings.
        
            -1 release audit.  The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings).
        
            +1 system test framework.  The patch passed system test framework compile.
        

        97 warnings are addressed by HDFS-1511. I'm going to commit this shortly.

        Show
        Konstantin Boudnik added a comment - -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 97 release audit warnings (more than the trunk's current 1 warnings). +1 system test framework. The patch passed system test framework compile. 97 warnings are addressed by HDFS-1511 . I'm going to commit this shortly.
        Hide
        Konstantin Boudnik added a comment -

        Thanks Hirong - all test cases are passing for me. I''l run the validation and will commit the patch if everything is Ok.

        +1 on the patch.

        Show
        Konstantin Boudnik added a comment - Thanks Hirong - all test cases are passing for me. I''l run the validation and will commit the patch if everything is Ok. +1 on the patch.
        Hide
        Hairong Kuang added a comment -

        This patch should be able to fix the problem.

        Show
        Hairong Kuang added a comment - This patch should be able to fix the problem.
        Hide
        Konstantin Shvachko added a comment -

        Oh, and there is a redundant import org.mockito.Matchers

        Show
        Konstantin Shvachko added a comment - Oh, and there is a redundant import org.mockito.Matchers
        Hide
        Konstantin Shvachko added a comment -

        Your fix caused the problem with testRBWReplicas & testRWRReplicas.
        You mocked InterDatanodeProtocol.updateReplicaUnderRecovery(oldBlock, recoveryId, newLen) to return the oldBlock.
        But updateReplicaUnderRecovery() is supposed to update the old replica with the newGS=recoveryId and newLen.
        Good thing DataNode assert caught you.

        Show
        Konstantin Shvachko added a comment - Your fix caused the problem with testRBWReplicas & testRWRReplicas. You mocked InterDatanodeProtocol.updateReplicaUnderRecovery(oldBlock, recoveryId, newLen) to return the oldBlock . But updateReplicaUnderRecovery() is supposed to update the old replica with the newGS=recoveryId and newLen . Good thing DataNode assert caught you.
        Hide
        Konstantin Boudnik added a comment -

        I'd suggest to commit and close NPE issue with this patch. For the failing tests we can open a separate one.

        Show
        Konstantin Boudnik added a comment - I'd suggest to commit and close NPE issue with this patch. For the failing tests we can open a separate one.
        Hide
        Konstantin Boudnik added a comment -

        With the patch only these two cases testRBWReplicas & testRWRReplicas keep failing.

        Show
        Konstantin Boudnik added a comment - With the patch only these two cases testRBWReplicas & testRWRReplicas keep failing.
        Hide
        Konstantin Boudnik added a comment -

        Looking somewhat more into this I see in a debugger that reply.getNumBytes() is 6000 newBlock.getNumBytes() is only 5000 (as set in the test class). So they are different which causes assertion to happen.

                if(rState == bestState) {
                  minLength = Math.min(minLength, r.rInfo.getNumBytes());
                  participatingList.add(r);
                }
        

        The difference is set by the code above, apparently. I am not that well familiar with recovery process in particular, but it seems like there are some issues with this part of the code. Thoughts?

        Show
        Konstantin Boudnik added a comment - Looking somewhat more into this I see in a debugger that reply.getNumBytes() is 6000 newBlock.getNumBytes() is only 5000 (as set in the test class). So they are different which causes assertion to happen. if(rState == bestState) { minLength = Math.min(minLength, r.rInfo.getNumBytes()); participatingList.add(r); } The difference is set by the code above, apparently. I am not that well familiar with recovery process in particular, but it seems like there are some issues with this part of the code. Thoughts?
        Hide
        Konstantin Boudnik added a comment -

        I have that patch which fixes NPE issue, but I still see two testcases failing on the assertion in the same line where NPE used to happen.

        Show
        Konstantin Boudnik added a comment - I have that patch which fixes NPE issue, but I still see two testcases failing on the assertion in the same line where NPE used to happen.
        Hide
        Todd Lipcon added a comment -

        I was looking into this yesterday. The issue is that we're calling it with a mock DataNode object in the test, but the mock isn't told what to return.

        We need an Answer interface to use in this test that just returns the passed in block.

        Show
        Todd Lipcon added a comment - I was looking into this yesterday. The issue is that we're calling it with a mock DataNode object in the test, but the mock isn't told what to return. We need an Answer interface to use in this test that just returns the passed in block.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development