HBase
  1. HBase
  2. HBASE-8900

TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: test
    • Labels:
      None
    • Tags:
      noob

      Description

      Failed here:

      https://builds.apache.org/job/hbase-0.95-on-hadoop2/169/testReport/junit/org.apache.hadoop.hbase.regionserver/TestRSKilledWhenMasterInitializing/testCorrectnessWhenMasterFailOver/

      and

      http://54.241.6.143/job/HBase-0.95-Hadoop-2/579/org.apache.hbase$hbase-server/testReport/junit/org.apache.hadoop.hbase.regionserver/TestRSKilledWhenMasterInitializing/org_apache_hadoop_hbase_regionserver_TestRSKilledWhenMasterInitializing/

      java.lang.Exception: test timed out after 120000 milliseconds
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hbase.zookeeper.ZKAssign.blockUntilNoRIT(ZKAssign.java:1002)
      	at org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver(TestRSKilledWhenMasterInitializing.java:177)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      and with this:

      java.lang.NullPointerException
      	at org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing.tearDownAfterClass(TestRSKilledWhenMasterInitializing.java:83)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runners.Suite.runChild(Suite.java:127)
      	at org.junit.runners.Suite.runChild(Suite.java:26)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      	at java.lang.Thread.run(Thread.java:662)
      
      

        Activity

        Hide
        stack added a comment -

        I am just going to remove it for now.

        Show
        stack added a comment - I am just going to remove it for now.
        Hide
        stack added a comment -

        Patch to remove the test. When someone has time, they can figure out how this is failing in a few places sporadically breaking builds.

        Show
        stack added a comment - Patch to remove the test. When someone has time, they can figure out how this is failing in a few places sporadically breaking builds.
        Hide
        stack added a comment -

        Committed removal of test from trunk and 0.95. Will leave open for someone who wants to restore.

        Show
        stack added a comment - Committed removal of test from trunk and 0.95. Will leave open for someone who wants to restore.
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95-on-hadoop2 #170 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/170/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501023)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in hbase-0.95-on-hadoop2 #170 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/170/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501023) Result = FAILURE stack : Files : /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #604 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/604/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501022)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #604 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/604/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501022) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95 #300 (See https://builds.apache.org/job/hbase-0.95/300/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501023)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in hbase-0.95 #300 (See https://builds.apache.org/job/hbase-0.95/300/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501023) Result = SUCCESS stack : Files : /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #4227 (See https://builds.apache.org/job/HBase-TRUNK/4227/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501022)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #4227 (See https://builds.apache.org/job/HBase-TRUNK/4227/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey (Revision 1501022) Result = SUCCESS stack : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        stack added a comment -

        I put back this test on trunk. I left it out of 0.95.

        Show
        stack added a comment - I put back this test on trunk. I left it out of 0.95.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #606 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/606/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; PUTTING BACK THIS PATCH ON TRUNK (Revision 1501542)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #606 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/606/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; PUTTING BACK THIS PATCH ON TRUNK (Revision 1501542) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #4232 (See https://builds.apache.org/job/HBase-TRUNK/4232/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; PUTTING BACK THIS PATCH ON TRUNK (Revision 1501542)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #4232 (See https://builds.apache.org/job/HBase-TRUNK/4232/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; PUTTING BACK THIS PATCH ON TRUNK (Revision 1501542) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        stack added a comment -

        Had to actually remove the files just now. An empty file was left in place in 0.95 causing RAT to fail (Thanks Jean-Daniel Cryans). I just removed this test from trunk too since it breaks from time to time. I am leaving the issue open so if someone wants too, they can do test fixup (leaving it open so we won't forget this issue).

        Show
        stack added a comment - Had to actually remove the files just now. An empty file was left in place in 0.95 causing RAT to fail (Thanks Jean-Daniel Cryans ). I just removed this test from trunk too since it breaks from time to time. I am leaving the issue open so if someone wants too, they can do test fixup (leaving it open so we won't forget this issue).
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95 #335 (See https://builds.apache.org/job/hbase-0.95/335/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504358)

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.95 #335 (See https://builds.apache.org/job/hbase-0.95/335/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504358) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4266 (See https://builds.apache.org/job/HBase-TRUNK/4266/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504359)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4266 (See https://builds.apache.org/job/HBase-TRUNK/4266/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504359) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95-on-hadoop2 #184 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/184/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504358)

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.95-on-hadoop2 #184 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/184/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504358) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #622 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/622/)
        HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504359)

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #622 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/622/ ) HBASE-8900 TestRSKilledWhenMasterInitializing.testCorrectnessWhenMasterFailOver is flakey; ACTUALLY REMOVE FILE, WAS BREAKING RAT CHECK (stack: rev 1504359) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java
        Hide
        ramkrishna.s.vasudevan added a comment -

        Actually here we have got an HDFS exception

        Could not obtain block: BP-1276573177-67.195.138.60-1373240501128:blk_-2451087782628280101_1034 file=/user/jenkins/hbase/testCorrectnessWhenMasterFailOver/fbf43cbd6a7509dbedc9f0fa410c46a9/recovered.edits/0000000000000000002
        

        also AccessControlException which has prevented the HlogReader to read the file after failover. Hence once of the regions continues to be in RIT and the testcase timed out.

        Show
        ramkrishna.s.vasudevan added a comment - Actually here we have got an HDFS exception Could not obtain block: BP-1276573177-67.195.138.60-1373240501128:blk_-2451087782628280101_1034 file=/user/jenkins/hbase/testCorrectnessWhenMasterFailOver/fbf43cbd6a7509dbedc9f0fa410c46a9/recovered.edits/0000000000000000002 also AccessControlException which has prevented the HlogReader to read the file after failover. Hence once of the regions continues to be in RIT and the testcase timed out.
        Hide
        stack added a comment -

        ramkrishna.s.vasudevan Anything in the log on what happened to the block? On the ACE issue, was the log from before I disabled short-circuit read for all unit tests do you know? Thanks for taking a look.

        Show
        stack added a comment - ramkrishna.s.vasudevan Anything in the log on what happened to the block? On the ACE issue, was the log from before I disabled short-circuit read for all unit tests do you know? Thanks for taking a look.
        Hide
        stack added a comment -

        I am going to remove from trunk too.

        Here we have one of those silent failures. If I compare a list of tests that passed on successful run to those that show on this fail I get this difference:

        durruti:trunk stack$ diff /tmp/bad_trunk.txt /tmp/good_trunk.txt
        91a92
        > Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
        159a161
        > Running org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing
        176a179,180
        > Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed
        > Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
        183,185c187,188
        < Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
        < Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
        < Running org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
        ---
        > Running org.apache.hadoop.hbase.replication.TestReplicationQueueFailover
        > Running org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed
        200a204
        > Running org.apache.hadoop.hbase.rest.client.TestRemoteAdmin
        

        TestMultiTableInputFormat has been removed already. TestReplicationKillMasterRS is likely new since the good run. etc.

        TestRSKilledWhenMasterInitializing is in the list. I'm going to remove it for now until it has been fixed. It is already removed from 0.95. Doing same for trunk so can get clean builds.

        Show
        stack added a comment - I am going to remove from trunk too. Here we have one of those silent failures. If I compare a list of tests that passed on successful run to those that show on this fail I get this difference: durruti:trunk stack$ diff /tmp/bad_trunk.txt /tmp/good_trunk.txt 91a92 > Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat 159a161 > Running org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing 176a179,180 > Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed > Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort 183,185c187,188 < Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS < Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed < Running org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS --- > Running org.apache.hadoop.hbase.replication.TestReplicationQueueFailover > Running org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed 200a204 > Running org.apache.hadoop.hbase. rest .client.TestRemoteAdmin TestMultiTableInputFormat has been removed already. TestReplicationKillMasterRS is likely new since the good run. etc. TestRSKilledWhenMasterInitializing is in the list. I'm going to remove it for now until it has been fixed. It is already removed from 0.95. Doing same for trunk so can get clean builds.
        Hide
        stack added a comment -

        Hmm... nvm. Already removed above. I was comparing old builds.

        Show
        stack added a comment - Hmm... nvm. Already removed above. I was comparing old builds.
        Hide
        stack added a comment -

        Resolving. If someone wants to fix up the test, that'd be good but closing out this old issue.

        Show
        stack added a comment - Resolving. If someone wants to fix up the test, that'd be good but closing out this old issue.

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development