Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-733

TestBlockReport fails intermittently

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.21.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. HDFS-733.patch
      1 kB
      Konstantin Boudnik
    2. HDFS-733.patch
      3 kB
      Konstantin Boudnik
    3. HDFS-733.patch
      3 kB
      Konstantin Boudnik
    4. HDFS-733.patch
      10 kB
      Konstantin Boudnik
    5. HDFS-733.2.patch
      2 kB
      Konstantin Boudnik

      Activity

      Hide
      Tsz Wo Nicholas Sze added a comment -

      This should be fixed in 0.21 as well.

      Show
      Tsz Wo Nicholas Sze added a comment - This should be fixed in 0.21 as well.
      Hide
      Konstantin Boudnik added a comment -

      However, it shouldn't be related to 0.20.2 (as been marked before) because these tests don't exist in 0.20+

      Show
      Konstantin Boudnik added a comment - However, it shouldn't be related to 0.20.2 (as been marked before) because these tests don't exist in 0.20+
      Hide
      Konstantin Boudnik added a comment -

      It seems that test cluster network or disks are slower than a typical setup I was running these tests on. Because of that it takes much longer to detect a temp. replica on the second datanode. Provided patch increases the waiting timeout and also adds some debug logging to see the progress of waiting for the replica.

      Show
      Konstantin Boudnik added a comment - It seems that test cluster network or disks are slower than a typical setup I was running these tests on. Because of that it takes much longer to detect a temp. replica on the second datanode. Provided patch increases the waiting timeout and also adds some debug logging to see the progress of waiting for the replica.
      Hide
      Konstantin Boudnik added a comment -

      Seems to be ready to go.

      Show
      Konstantin Boudnik added a comment - Seems to be ready to go.
      Hide
      Konstantin Boudnik added a comment -

      Looks like test-patch is slow today, so I've ran it locally.

      There appear to be 105 release audit warnings before the patch and 105 release audit warnings after applying the patch.
      
      +1 overall.
          +1 @author.  The patch does not contain any @author tags.
          +1 tests included.  The patch appears to include 3 new or modified tests.
          +1 javadoc.  The javadoc tool did not generate any warning messages.
          +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          +1 release audit.  The applied patch does not increase the total number of release audit warnings.
      

      Also, I've ran the test on BSD and Linux and everything seems fine.

      Show
      Konstantin Boudnik added a comment - Looks like test-patch is slow today, so I've ran it locally. There appear to be 105 release audit warnings before the patch and 105 release audit warnings after applying the patch. +1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Also, I've ran the test on BSD and Linux and everything seems fine.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12423953/HDFS-733.patch
      against trunk revision 832540.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423953/HDFS-733.patch against trunk revision 832540. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/94/console This message is automatically generated.
      Hide
      Konstantin Shvachko added a comment -

      Looks like Hudson did not change his opinion on this.
      Looking at the log. Seems that replication has actually already happened by the time the tests starts waiting for the temporary replica.

      2009-11-03 23:15:26,213 INFO  hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlock(857)) - BLOCK* ask 127.0.0.1:60191 to replicate blk_3759837087059880694_1001 to datanode(s) 127.0.0.1:47781
      2009-11-03 23:15:26,427 INFO  datanode.DataNode (DataNode.java:transferBlock(1072)) - DatanodeRegistration(127.0.0.1:60191, storageID=DS-681753498-127.0.1.1-60191-1257290123417, infoPort=54109, ipcPort=41992) Starting thread to transfer block blk_3759837087059880694_1001 to 127.0.0.1:47781 
      2009-11-03 23:15:26,431 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(224)) - Receiving block blk_3759837087059880694_1001 src: /127.0.0.1:38882 dest: /127.0.0.1:47781
      2009-11-03 23:15:26,450 INFO  datanode.DataNode (DataNode.java:run(1265)) - DatanodeRegistration(127.0.0.1:60191, storageID=DS-681753498-127.0.1.1-60191-1257290123417, infoPort=54109, ipcPort=41992):Transmitted block blk_3759837087059880694_1001 to /127.0.0.1:47781
      2009-11-03 23:15:26,458 DEBUG datanode.DataNode (BlockReceiver.java:receivePacket(443)) - Receiving one packet for block blk_3759837087059880694_1001 of length 0 seqno 3 offsetInBlock 3072000 lastPacketInBlock true
      2009-11-03 23:15:26,458 DEBUG datanode.DataNode (BlockReceiver.java:receivePacket(476)) - Receiving an empty packet or the end of the block blk_3759837087059880694_1001
      2009-11-03 23:15:26,458 DEBUG datanode.DataNode (FSDataset.java:addBlock(136)) - addBlock: Moved /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/tmp/blk_3759837087059880694_1001.meta to /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/current/finalized/blk_3759837087059880694_1001.meta
      2009-11-03 23:15:26,459 DEBUG datanode.DataNode (FSDataset.java:addBlock(137)) - addBlock: Moved /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/tmp/blk_3759837087059880694 to /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/current/finalized/blk_3759837087059880694
      2009-11-03 23:15:26,459 INFO  datanode.DataNode (DataXceiver.java:opWriteBlock(369)) - Received block blk_3759837087059880694_1001 src: /127.0.0.1:38882 dest: /127.0.0.1:47781 of size 3072000
      
      2009-11-03 23:15:26,461 DEBUG hdfs.StateChange (FSNamesystem.java:blockReceived(2848)) - BLOCK* NameSystem.blockReceived: blk_3759837087059880694_1001 is received from 127.0.0.1:47781
      2009-11-03 23:15:26,461 DEBUG namenode.FSNamesystem (PendingReplicationBlocks.java:remove(87)) - Removing pending replication for blockblk_3759837087059880694_1001
      2009-11-03 23:15:26,461 DEBUG namenode.FSNamesystem (DatanodeDescriptor.java:processReportedBlock(466)) - Reported block blk_3759837087059880694_1001 on 127.0.0.1:47781 size 3072000 replicaState = FINALIZED
      2009-11-03 23:15:26,462 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1102)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47781 is added to blk_3759837087059880694_1001 size 3072000
      

      And then you start waiting in the test:

      2009-11-03 23:15:26,471 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(600)) - Replica state before the loop 0
      2009-11-03 23:15:26,571 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0
      2009-11-03 23:15:26,671 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0
      2009-11-03 23:15:26,772 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0
      2009-11-03 23:15:26,872 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0
      

      So waiting longer does not help.

      Show
      Konstantin Shvachko added a comment - Looks like Hudson did not change his opinion on this. Looking at the log. Seems that replication has actually already happened by the time the tests starts waiting for the temporary replica. 2009-11-03 23:15:26,213 INFO hdfs.StateChange (BlockManager.java:computeReplicationWorkForBlock(857)) - BLOCK* ask 127.0.0.1:60191 to replicate blk_3759837087059880694_1001 to datanode(s) 127.0.0.1:47781 2009-11-03 23:15:26,427 INFO datanode.DataNode (DataNode.java:transferBlock(1072)) - DatanodeRegistration(127.0.0.1:60191, storageID=DS-681753498-127.0.1.1-60191-1257290123417, infoPort=54109, ipcPort=41992) Starting thread to transfer block blk_3759837087059880694_1001 to 127.0.0.1:47781 2009-11-03 23:15:26,431 INFO datanode.DataNode (DataXceiver.java:opWriteBlock(224)) - Receiving block blk_3759837087059880694_1001 src: /127.0.0.1:38882 dest: /127.0.0.1:47781 2009-11-03 23:15:26,450 INFO datanode.DataNode (DataNode.java:run(1265)) - DatanodeRegistration(127.0.0.1:60191, storageID=DS-681753498-127.0.1.1-60191-1257290123417, infoPort=54109, ipcPort=41992):Transmitted block blk_3759837087059880694_1001 to /127.0.0.1:47781 2009-11-03 23:15:26,458 DEBUG datanode.DataNode (BlockReceiver.java:receivePacket(443)) - Receiving one packet for block blk_3759837087059880694_1001 of length 0 seqno 3 offsetInBlock 3072000 lastPacketInBlock true 2009-11-03 23:15:26,458 DEBUG datanode.DataNode (BlockReceiver.java:receivePacket(476)) - Receiving an empty packet or the end of the block blk_3759837087059880694_1001 2009-11-03 23:15:26,458 DEBUG datanode.DataNode (FSDataset.java:addBlock(136)) - addBlock: Moved /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/tmp/blk_3759837087059880694_1001.meta to /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/current/finalized/blk_3759837087059880694_1001.meta 2009-11-03 23:15:26,459 DEBUG datanode.DataNode (FSDataset.java:addBlock(137)) - addBlock: Moved /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/tmp/blk_3759837087059880694 to /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build/test/data/dfs/data/data3/current/finalized/blk_3759837087059880694 2009-11-03 23:15:26,459 INFO datanode.DataNode (DataXceiver.java:opWriteBlock(369)) - Received block blk_3759837087059880694_1001 src: /127.0.0.1:38882 dest: /127.0.0.1:47781 of size 3072000 2009-11-03 23:15:26,461 DEBUG hdfs.StateChange (FSNamesystem.java:blockReceived(2848)) - BLOCK* NameSystem.blockReceived: blk_3759837087059880694_1001 is received from 127.0.0.1:47781 2009-11-03 23:15:26,461 DEBUG namenode.FSNamesystem (PendingReplicationBlocks.java:remove(87)) - Removing pending replication for blockblk_3759837087059880694_1001 2009-11-03 23:15:26,461 DEBUG namenode.FSNamesystem (DatanodeDescriptor.java:processReportedBlock(466)) - Reported block blk_3759837087059880694_1001 on 127.0.0.1:47781 size 3072000 replicaState = FINALIZED 2009-11-03 23:15:26,462 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1102)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47781 is added to blk_3759837087059880694_1001 size 3072000 And then you start waiting in the test: 2009-11-03 23:15:26,471 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(600)) - Replica state before the loop 0 2009-11-03 23:15:26,571 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0 2009-11-03 23:15:26,671 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0 2009-11-03 23:15:26,772 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0 2009-11-03 23:15:26,872 DEBUG datanode.TestBlockReport (TestBlockReport.java:prepareSecondReplica(605)) - Keep waiting for blk_3759837087059880694 is in state 0 So waiting longer does not help.
      Hide
      Konstantin Boudnik added a comment -

      Well, we had two possible assumptions:

      • we aren't waiting too long because the build cluster's nodes are too slow (doesn't seem to be the case)
      • we do start waiting when everything has already happened (the total opposite of the above)

      As the first assumption turns to be false, the next step would be to try to increase the size of the block up to say 60M instead of 6M. May be this will give us enough leverage to make a this test do the right thing. If not, I guess the test scenario will have to be reconsidered

      Show
      Konstantin Boudnik added a comment - Well, we had two possible assumptions: we aren't waiting too long because the build cluster's nodes are too slow (doesn't seem to be the case) we do start waiting when everything has already happened (the total opposite of the above) As the first assumption turns to be false, the next step would be to try to increase the size of the block up to say 60M instead of 6M. May be this will give us enough leverage to make a this test do the right thing. If not, I guess the test scenario will have to be reconsidered
      Hide
      Konstantin Boudnik added a comment -

      This version of the patch increases size of a block up to 15Mb and file size up to 60Mb.
      Also, it now tries to locate last block of the file rather than the first one.

      Show
      Konstantin Boudnik added a comment - This version of the patch increases size of a block up to 15Mb and file size up to 60Mb. Also, it now tries to locate last block of the file rather than the first one.
      Hide
      Konstantin Boudnik added a comment -

      Verifying second assumption.

      Show
      Konstantin Boudnik added a comment - Verifying second assumption.
      Hide
      Konstantin Boudnik added a comment -

      Well, the number of blocks has to be two otherwise more changes will be needed.

      Show
      Konstantin Boudnik added a comment - Well, the number of blocks has to be two otherwise more changes will be needed.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12424006/HDFS-733.patch
      against trunk revision 832585.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424006/HDFS-733.patch against trunk revision 832585. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/95/console This message is automatically generated.
      Hide
      Konstantin Boudnik added a comment -

      Chasing race condition ...

      Show
      Konstantin Boudnik added a comment - Chasing race condition ...
      Hide
      Konstantin Boudnik added a comment -

      New patch essentially run a separate check thread which starts new DN and waits for the replication of all blocks to finish. DFSTestUtil.waitReplication which waits for the finish of replication seems to be flawed because on large blocks it reports the end of replication prematurely, which might be a bug.

      The main thread waits until a temp. replica will appear on the new DN and checks for number of blocks pending replication.

      This seems to be working properly.

      Show
      Konstantin Boudnik added a comment - New patch essentially run a separate check thread which starts new DN and waits for the replication of all blocks to finish. DFSTestUtil.waitReplication which waits for the finish of replication seems to be flawed because on large blocks it reports the end of replication prematurely, which might be a bug. The main thread waits until a temp. replica will appear on the new DN and checks for number of blocks pending replication. This seems to be working properly.
      Hide
      Konstantin Boudnik added a comment -

      More universal approach on checking for replicas' status.

      Show
      Konstantin Boudnik added a comment - More universal approach on checking for replicas' status.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12424085/HDFS-733.patch
      against trunk revision 832585.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424085/HDFS-733.patch against trunk revision 832585. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/97/console This message is automatically generated.
      Hide
      Konstantin Boudnik added a comment -

      Seems like the problem w/ BlockReportTest cases is fixed. Contrib test can't be connected to the patch.

      Show
      Konstantin Boudnik added a comment - Seems like the problem w/ BlockReportTest cases is fixed. Contrib test can't be connected to the patch.
      Hide
      Konstantin Boudnik added a comment -

      After checking the console output it seems the contrib failure is this:

      Test org.apache.hadoop.raid.TestRaidPurge FAILED
      

      which is clearly unrelated.

      Show
      Konstantin Boudnik added a comment - After checking the console output it seems the contrib failure is this: Test org.apache.hadoop.raid.TestRaidPurge FAILED which is clearly unrelated.
      Hide
      Konstantin Shvachko added a comment -

      +1

      Show
      Konstantin Shvachko added a comment - +1
      Hide
      Konstantin Boudnik added a comment -

      I've just committed this.

      Show
      Konstantin Boudnik added a comment - I've just committed this.
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk-Commit #99 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/99/)
      . TestBlockReport fails intermittently. Contributed by Konstantin Boudnik

      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #99 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/99/ ) . TestBlockReport fails intermittently. Contributed by Konstantin Boudnik
      Hide
      Hudson added a comment -

      Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/)

      Show
      Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/ )
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk #136 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/136/)

      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #136 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/136/ )
      Hide
      Konstantin Boudnik added a comment -

      Race conditions kept coming back

      Show
      Konstantin Boudnik added a comment - Race conditions kept coming back
      Hide
      Konstantin Boudnik added a comment -

      I've increased the size of a block and written file respectively. The whole thing is pretty weird cause no race conditions were seeing ever on any of my test machines.

      Show
      Konstantin Boudnik added a comment - I've increased the size of a block and written file respectively. The whole thing is pretty weird cause no race conditions were seeing ever on any of my test machines.
      Hide
      Konstantin Boudnik added a comment -

      Let's see how it works on a Hudson test nodes.

      Show
      Konstantin Boudnik added a comment - Let's see how it works on a Hudson test nodes.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12424751/HDFS-733.2.patch
      against trunk revision 835179.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424751/HDFS-733.2.patch against trunk revision 835179. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/console This message is automatically generated.
      Hide
      Hudson added a comment -

      Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/)

      Show
      Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/ )
      Hide
      Suresh Srinivas added a comment -

      +1 for the change.

      Show
      Suresh Srinivas added a comment - +1 for the change.
      Hide
      Konstantin Boudnik added a comment -

      I've ran the patched test on the Hudson hardware a few times and everything seems to be all right - no failures are seeing. I'm going to commit this shortly.

      Show
      Konstantin Boudnik added a comment - I've ran the patched test on the Hudson hardware a few times and everything seems to be all right - no failures are seeing. I'm going to commit this shortly.
      Hide
      Konstantin Boudnik added a comment -

      The fix is committed to the trunk and to the branch 0.21

      Show
      Konstantin Boudnik added a comment - The fix is committed to the trunk and to the branch 0.21
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk-Commit #110 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/110/)
      . TestBlockReport fails intermittently (cos)

      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #110 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/110/ ) . TestBlockReport fails intermittently (cos)
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Hdfs-trunk #141 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/141/)
      . TestBlockReport fails intermittently (cos)

      Show
      Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #141 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/141/ ) . TestBlockReport fails intermittently (cos)
      Hide
      Hudson added a comment -

      Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #76 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/76/)
      . TestBlockReport fails intermittently (cos)

      Show
      Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #76 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/76/ ) . TestBlockReport fails intermittently (cos)
      Hide
      Hudson added a comment -

      Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #115 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/115/)

      Show
      Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #115 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/115/ )
      Hide
      Eli Collins added a comment -

      I'm seeing this again on trunk, eg http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/testReport. Running TestBlockReport locally works with the same patch, and has worked for a while with this patch.

      Regression

      org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08 (from TestBlockReport)

      Failing for the past 1 build (Since #236 )
      Took 4.3 sec.
      add description
      Error Message

      Wrong number of PendingReplication blocks expected:<2> but was:<1>
      Stacktrace

      junit.framework.AssertionFailedError: Wrong number of PendingReplication blocks expected:<2> but was:<1>
      at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:393)

      Show
      Eli Collins added a comment - I'm seeing this again on trunk, eg http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/236/testReport . Running TestBlockReport locally works with the same patch, and has worked for a while with this patch. Regression org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08 (from TestBlockReport) Failing for the past 1 build (Since #236 ) Took 4.3 sec. add description Error Message Wrong number of PendingReplication blocks expected:<2> but was:<1> Stacktrace junit.framework.AssertionFailedError: Wrong number of PendingReplication blocks expected:<2> but was:<1> at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:393)
      Hide
      Eli Collins added a comment -

      Was able to reproduce this locally by looping the test, here's another assert that failed:

      Testcase: blockReport_08 took 50.467 sec
      FAILED
      Was waiting too long for a replica to become TEMPORARY
      junit.framework.AssertionFailedError: Was waiting too long for a replica to become TEMPORARY
      at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.waitForTempReplica(TestBlockReport.java:483)
      at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:387)

      Show
      Eli Collins added a comment - Was able to reproduce this locally by looping the test, here's another assert that failed: Testcase: blockReport_08 took 50.467 sec FAILED Was waiting too long for a replica to become TEMPORARY junit.framework.AssertionFailedError: Was waiting too long for a replica to become TEMPORARY at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.waitForTempReplica(TestBlockReport.java:483) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08(TestBlockReport.java:387)
      Hide
      Todd Lipcon added a comment -

      Still happening occasionally in trunk, same error as Eli posted above.

      Show
      Todd Lipcon added a comment - Still happening occasionally in trunk, same error as Eli posted above.
      Hide
      Konstantin Boudnik added a comment -

      it sure does. Seems like on some occasions the replica either needs more time to become TEMP. or this happens too fast before checking thread even kicks in... Damn nasty problem.

      Show
      Konstantin Boudnik added a comment - it sure does. Seems like on some occasions the replica either needs more time to become TEMP. or this happens too fast before checking thread even kicks in... Damn nasty problem.
      Hide
      Matt Foley added a comment -

      For analysis and solution, please see HDFS-1806.

      Show
      Matt Foley added a comment - For analysis and solution, please see HDFS-1806 .

        People

        • Assignee:
          Konstantin Boudnik
          Reporter:
          Suresh Srinivas
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development