Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2433

TestFileAppend4 fails intermittently

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • 0.20.205.0, 1.0.0
    • None
    • datanode, namenode, test
    • None

    Description

      A Jenkins build we have running failed twice in a row with issues form TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the error I ran TestFileAppend4 in a loop over night saving the results away. (No clean was done in between test runs)

      When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] tests fail about 10% of the time (14 times out of 130 tries) They all fail with something like the following. Often it is only one of the tests that fail, but I have seen as many as two fail in one run.

      Testcase: testAppendSyncReplication2 took 32.198 sec
              FAILED
      Should have 2 replicas for that block, not 1
      junit.framework.AssertionFailedError: Should have 2 replicas for that block, not 1
              at org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
              at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
      

      I also saw several other tests that are a part of TestFileApped4 fail during this experiment. They may all be related to one another so I am filing them in the same JIRA. If it turns out that they are not related then they can be split up later.

      testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the time

      Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
              FAILED
      unexpected file size! received=0 , expected=1024
      junit.framework.AssertionFailedError: unexpected file size! received=0 , expected=1024
              at org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
              at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
      

      testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of the time

      Testcase: testAppendSyncChecksum1 took 32.385 sec
              FAILED
      Should have 1 replica for that block, not 2
      junit.framework.AssertionFailedError: Should have 1 replica for that block, not 2
              at org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
              at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
      

      I will attach logs for all of the failures. Be aware that I did change some of the logging messages in this test so I could better see when testAppendSyncReplication started and ended. Other then that the code is stock 0.20.205 RC2

      Attachments

        1. failed.tar.bz2
          3.03 MB
          Robert Joseph Evans

        Activity

          People

            Unassigned Unassigned
            revans2 Robert Joseph Evans
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: