Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12725

Insufficient wait in TestMultiParallel#testFlushCommitsWithAbort might lead to failed assertion on server count

    XMLWordPrintableJSON

Details

    • Test
    • Status: Closed
    • Minor
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None

    Description

      From https://builds.apache.org/job/hbase-0.98/750/testReport/org.apache.hadoop.hbase.client/TestMultiParallel/testFlushCommitsWithAbort/ :

      2014-12-19 01:49:57,877 FATAL [Thread-677] regionserver.HRegionServer(1886): ABORTING region server priapus.apache.org,50706,1418953791906: Aborting for tests
      java.lang.Exception: doTestFlushCommits
        at org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:268)
        at org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsWithAbort(TestMultiParallel.java:240)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      2014-12-19 01:49:57,877 FATAL [Thread-677] regionserver.HRegionServer(1894): RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.                   MultiRowMutationEndpoint]
      2014-12-19 01:49:57,917 INFO  [Thread-677] client.TestMultiParallel(287): validating loaded data
      2014-12-19 01:49:58,275 INFO  [Thread-677] client.TestMultiParallel(295): Count=1, Alive=priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,275 INFO  [Thread-677] client.TestMultiParallel(295): Count=2, Alive=priapus.apache.org,41103,1418953791972
      2014-12-19 01:49:58,275 INFO  [Thread-677] client.TestMultiParallel(295): Count=3, Alive=priapus.apache.org,33781,1418953792016
      2014-12-19 01:49:58,275 INFO  [Thread-677] client.TestMultiParallel(295): Count=4, Alive=priapus.apache.org,43381,1418953792058
      2014-12-19 01:49:58,276 INFO  [Thread-677] client.TestMultiParallel(295): Count=5, Alive=priapus.apache.org,54398,1418953792103
      2014-12-19 01:49:58,276 INFO  [Thread-677] client.TestMultiParallel(297): Count=5
      2014-12-19 01:49:58,289 INFO  [RS:0;priapus:50706.periodicFlusher] hbase.Chore(93): RS:0;priapus:50706.periodicFlusher exiting
      2014-12-19 01:49:58,290 INFO  [RS:0;priapus:50706.leaseChecker] regionserver.Leases(147): RS:0;priapus:50706.leaseChecker closing leases
      2014-12-19 01:49:58,290 INFO  [RS:0;priapus:50706.leaseChecker] regionserver.Leases(150): RS:0;priapus:50706.leaseChecker closed leases
      2014-12-19 01:49:58,291 INFO  [RS:0;priapus:50706] regionserver.CompactSplitThread(360): Waiting for Split Thread to finish...
      2014-12-19 01:49:58,291 INFO  [RS:0;priapus:50706] regionserver.CompactSplitThread(360): Waiting for Merge Thread to finish...
      2014-12-19 01:49:58,291 INFO  [RS:0;priapus:50706] regionserver.CompactSplitThread(360): Waiting for Large Compaction Thread to finish...
      2014-12-19 01:49:58,291 INFO  [RS:0;priapus:50706] regionserver.CompactSplitThread(360): Waiting for Small Compaction Thread to finish...
      2014-12-19 01:49:58,295 DEBUG [RS:0;priapus:50706-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:50706-0x14a603c969c0001, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/replication/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,295 INFO  [RS:0;priapus:50706] client.HConnectionManager$HConnectionImplementation(1893): Closing zookeeper sessionid=0x14a603c969c0010
      2014-12-19 01:49:58,302 DEBUG [RS:4;priapus:54398-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:54398-0x14a603c969c0004, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,302 DEBUG [RS:2;priapus:33781-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:33781-0x14a603c969c0002, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,302 DEBUG [RS:0;priapus:50706-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:50706-0x14a603c969c0001, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,302 DEBUG [RS:3;priapus:43381-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:43381-0x14a603c969c0005, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,302 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(312): master:48016-0x14a603c969c0000, quorum=localhost:53936, baseZNode=/hbase Received ZooKeeper     Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,302 DEBUG [RS:1;priapus:41103-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:41103-0x14a603c969c0003, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/rs/priapus.apache.org,50706,1418953791906
      2014-12-19 01:49:58,303 INFO  [main-EventThread] zookeeper.RegionServerTracker(118): RegionServer ephemeral node deleted, processing expiration [priapus.apache.org,50706,        1418953791906]
      2014-12-19 01:49:58,302 DEBUG [RS:0;priapus:50706-EventThread] zookeeper.ZooKeeperWatcher(312): regionserver:50706-0x14a603c969c0001, quorum=localhost:53936, baseZNode=/hbase    Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs
      2014-12-19 01:49:58,309 INFO  [RS:0;priapus:50706] regionserver.HRegionServer(1065): stopping server priapus.apache.org,50706,1418953791906; zookeeper connection closed.
      2014-12-19 01:49:58,309 INFO  [RS:0;priapus:50706] regionserver.HRegionServer(1068): RS:0;priapus:50706 exiting
      

      region server priapus.apache.org,50706,1418953791906 abort was initiated @ 01:49:57,877
      Check for live region server count was performed @ 01:49:58,275
      However, region server exited @ 01:49:58,309

      Here is the code in doTestFlushCommits() of 0.98:

            while (liveRS.getRegionServer().getNumberOfOnlineRegions() != 0) {
              Thread.sleep(10);
      

      In master branch, the sleep duration is 100

      Attachments

        1. 12725-0.98.txt
          0.8 kB
          Ted Yu

        Activity

          People

            Unassigned Unassigned
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: