Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14807

TestWALLockup is flakey

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0, 1.3.0, 1.1.4, 2.0.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Fails frequently.

      Looks like this:

      2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): Found 0 recovered edits file(s) under /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
      2015-11-12 10:38:51,821 DEBUG [Time-limited test] regionserver.FlushLargeStoresPolicy(56): hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use global config(16777216) instead
      2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote region seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid to file, newSeqId=2, maxSeqId=0
      2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2
      2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error syncing, request close of WAL
      java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
      	at org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
      	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
      	at java.lang.Thread.run(Thread.java:745)
      2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL roll requested
      2015-11-12 10:38:52,019 DEBUG [flusher] regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above the size, flushing all.
      2015-11-12 10:38:52,192 INFO  [Thread-4] regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED
      java.lang.InterruptedException: sleep interrupted
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
      	at org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
      2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning test directory: /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.lang.Thread.run(Thread.java:745)
      
      

      ... then times out after being locked up for 30 seconds. Writes 50+MB of logs while spinning.

      Reported as this:

      -------------------------------------------------------------------------------
      Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
      -------------------------------------------------------------------------------
      Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup
      testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)  Time elapsed: 0.049 sec  <<< ERROR!
      org.junit.runners.model.TestTimedOutException: test timed out after 30000 milliseconds
      	at org.apache.log4j.Category.callAppenders(Category.java:205)
      	at org.apache.log4j.Category.forcedLog(Category.java:391)
      	at org.apache.log4j.Category.log(Category.java:856)
      	at org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:155)
      	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1386)
      	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1352)
      	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1302)
      	at org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:260)
      

      Failed here most recently:

      https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/364/jdk=latest1.7,label=Hadoop/

        Attachments

        1. 14807.patch
          4 kB
          Michael Stack
        2. 14807.second.attempt.txt
          7 kB
          Michael Stack
        3. 14807.second.attempt.txt
          7 kB
          Michael Stack

          Issue Links

            Activity

              People

              • Assignee:
                stack Michael Stack
                Reporter:
                stack Michael Stack
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: