Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3181

testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.0-alpha
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart seems to be failing intermittently on jenkins.

      org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
      Failing for the past 1 build (Since Failed#2163 )
      Took 8.4 sec.
      Error Message
      
      Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at java.security.AccessController.doPrivileged(Native Method)  at javax.security.auth.Subject.doAs(Subject.java:396)  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
      
      Stacktrace
      
      org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by DFSClient_NONMAPREDUCE_1147689755_1
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
              ...
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
      	at $Proxy15.getAdditionalDatanode(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:317)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:828)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
      
      1. testOut.txt
        120 kB
        Colin Patrick McCabe
      2. TestLeaseRecovery2with1535.patch
        0.7 kB
        Tsz Wo Nicholas Sze
      3. repro.txt
        0.7 kB
        Todd Lipcon
      4. h3181_20120425.patch
        3 kB
        Tsz Wo Nicholas Sze
      5. h3181_20120426.patch
        5 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1062 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1062/)
          HDFS-3181. Fix a test case in TestLeaseRecovery2. (Revision 1331138)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1062 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1062/ ) HDFS-3181 . Fix a test case in TestLeaseRecovery2. (Revision 1331138) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1027 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1027/)
          HDFS-3181. Fix a test case in TestLeaseRecovery2. (Revision 1331138)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1027 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1027/ ) HDFS-3181 . Fix a test case in TestLeaseRecovery2. (Revision 1331138) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2160 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2160/)
          HDFS-3181. Fix a test case in TestLeaseRecovery2. (Revision 1331138)

          Result = ABORTED
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2160 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2160/ ) HDFS-3181 . Fix a test case in TestLeaseRecovery2. (Revision 1331138) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Todd, thanks for reviewing the first patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Todd, thanks for reviewing the first patch.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2143 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2143/)
          HDFS-3181. Fix a test case in TestLeaseRecovery2. (Revision 1331138)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2143 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2143/ ) HDFS-3181 . Fix a test case in TestLeaseRecovery2. (Revision 1331138) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2217 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2217/)
          HDFS-3181. Fix a test case in TestLeaseRecovery2. (Revision 1331138)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2217 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2217/ ) HDFS-3181 . Fix a test case in TestLeaseRecovery2. (Revision 1331138) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331138 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524660/h3181_20120426.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2338//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2338//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524660/h3181_20120426.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2338//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2338//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          When size == 0, there is a NPE.

          h3181_20120426.patch: fixes also the NPE.

          Show
          Tsz Wo Nicholas Sze added a comment - When size == 0, there is a NPE. h3181_20120426.patch: fixes also the NPE.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524381/h3181_20120425.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.TestUrlStreamHandler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2330//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2330//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524381/h3181_20120425.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.TestUrlStreamHandler +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2330//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2330//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          ah! good catch. +1 pending jenkins

          Show
          Todd Lipcon added a comment - ah! good catch. +1 pending jenkins
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The test writes one byte outside the try-block but an exception is thrown.

          h3181_20120425.patch:

               // make sure that the client can't write data anymore.
          -    stm.write('b');
               try {
          +      stm.write('b');
                 stm.hflush();
                 fail("Should not be able to flush after we've lost the lease");
               } catch (IOException e) {
          -      LOG.info("Expceted exception on hflush", e);
          +      LOG.info("Expceted exception on write/hflush", e);
               }
          
          Show
          Tsz Wo Nicholas Sze added a comment - The test writes one byte outside the try-block but an exception is thrown. h3181_20120425.patch: // make sure that the client can't write data anymore. - stm.write('b'); try { + stm.write('b'); stm.hflush(); fail( "Should not be able to flush after we've lost the lease" ); } catch (IOException e) { - LOG.info( "Expceted exception on hflush" , e); + LOG.info( "Expceted exception on write/hflush" , e); }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          It turns out that it is a bug in the test.

          Show
          Tsz Wo Nicholas Sze added a comment - It turns out that it is a bug in the test.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Todd, thanks for clarifying it. I can reproduce the failure now.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Todd, thanks for clarifying it. I can reproduce the failure now.
          Hide
          Todd Lipcon added a comment -

          Jenkins seems to have reproduced above

          Show
          Todd Lipcon added a comment - Jenkins seems to have reproduced above
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12521105/repro.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestLeaseRecovery2

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521105/repro.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestLeaseRecovery2 +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2167//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          But you're right that it must be chunk-size related rather than block-size. Perhaps it is specifically the first checksum chunk of a block (1535 == block size + chunk size - 1)

          Show
          Todd Lipcon added a comment - But you're right that it must be chunk-size related rather than block-size. Perhaps it is specifically the first checksum chunk of a block (1535 == block size + chunk size - 1)
          Hide
          Todd Lipcon added a comment -

          I think your patch was changing the wrong AppendTestUtil.nextInt() call. Here's a patch which changes the one called by the test in question

          Show
          Todd Lipcon added a comment - I think your patch was changing the wrong AppendTestUtil.nextInt() call. Here's a patch which changes the one called by the test in question
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12521094/TestLeaseRecovery2with1535.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2166//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2166//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521094/TestLeaseRecovery2with1535.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2166//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2166//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          TestLeaseRecovery2with1535.patch: see if Jerkins can reproduce the failure.

          Show
          Tsz Wo Nicholas Sze added a comment - TestLeaseRecovery2with1535.patch: see if Jerkins can reproduce the failure.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have run it in a loop for many times but still could not see one failure. Did you see the failure on truck or some other versions?

          Show
          Tsz Wo Nicholas Sze added a comment - I have run it in a loop for many times but still could not see one failure. Did you see the failure on truck or some other versions?
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • The block size is 1024. Why 1535 is "1 byte less than block size"? Do you mean chunk size per CRC or something else?
          • I cannot reproduce the failure for file size = 1535 or 2*BLOCK_SIZE-1.
          Show
          Tsz Wo Nicholas Sze added a comment - The block size is 1024. Why 1535 is "1 byte less than block size"? Do you mean chunk size per CRC or something else? I cannot reproduce the failure for file size = 1535 or 2*BLOCK_SIZE-1.
          Hide
          Todd Lipcon added a comment -

          FWIW, turning off the "replace datanode" feature in the pipeline yields a slightly different failure:

          java.io.IOException: BP-889532485-127.0.0.1-1333416170795:blk_137299169498047450_1003 does not exist or is not under Constructionblk_137299169498047450_1003{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, replicas=[ReplicaUnderConstruction[127.0.0.1:38024|RBW], ReplicaUnderConstruction[127.0.0.1:42350|RBW], ReplicaUnderConstruction[127.0.0.1:42129|RBW]]}
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:4375)
          	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:4439)
          	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:534)
          	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:746)
          	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42664)
          	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
          	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
          	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
          	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:416)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
          	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1656)
          
          	at org.apache.hadoop.ipc.Client.call(Client.java:1159)
          	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185)
          	at $Proxy15.updateBlockForPipeline(Unknown Source)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:616)
          	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
          	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
          	at $Proxy15.updateBlockForPipeline(Unknown Source)
          	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:746)
          	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
          	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
          	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
          
          Show
          Todd Lipcon added a comment - FWIW, turning off the "replace datanode" feature in the pipeline yields a slightly different failure: java.io.IOException: BP-889532485-127.0.0.1-1333416170795:blk_137299169498047450_1003 does not exist or is not under Constructionblk_137299169498047450_1003{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, replicas=[ReplicaUnderConstruction[127.0.0.1:38024|RBW], ReplicaUnderConstruction[127.0.0.1:42350|RBW], ReplicaUnderConstruction[127.0.0.1:42129|RBW]]} at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:4375) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:4439) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:534) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:746) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42664) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1656) at org.apache.hadoop.ipc.Client.call(Client.java:1159) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185) at $Proxy15.updateBlockForPipeline(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy15.updateBlockForPipeline(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:746) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
          Hide
          Todd Lipcon added a comment -

          I read through the logs a bit, and realized that this is entirely reproducible when the random size chosen is 1 byte less than a multiple of the block size. The following change causes it to reproduce every time:

          index a374e50..bd51984 100644
          --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java
          @@ -426,7 +426,7 @@ public class TestLeaseRecovery2 {
               assertTrue(dfs.dfs.exists(fileStr));
           
               // write bytes into the file.
          -    int size = AppendTestUtil.nextInt(FILE_SIZE);
          +    int size = 1535;//AppendTestUtil.nextInt(FILE_SIZE);
               AppendTestUtil.LOG.info("size=" + size);
               stm.write(buffer, 0, size);
          

          We should definitely look into this.

          Show
          Todd Lipcon added a comment - I read through the logs a bit, and realized that this is entirely reproducible when the random size chosen is 1 byte less than a multiple of the block size. The following change causes it to reproduce every time: index a374e50..bd51984 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java @@ -426,7 +426,7 @@ public class TestLeaseRecovery2 { assertTrue(dfs.dfs.exists(fileStr)); // write bytes into the file. - int size = AppendTestUtil.nextInt(FILE_SIZE); + int size = 1535; //AppendTestUtil.nextInt(FILE_SIZE); AppendTestUtil.LOG.info( "size=" + size); stm.write(buffer, 0, size); We should definitely look into this.
          Hide
          Colin Patrick McCabe added a comment -

          exception, standard output, etc of a failure

          Show
          Colin Patrick McCabe added a comment - exception, standard output, etc of a failure

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development