HBase
  1. HBase
  2. HBASE-5849

On first cluster startup, RS aborts if root znode is not available

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Rather than exit, the regionserver will now wait even though the root directory in zookeeper has yet to be created.

      Description

      When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time.

      Master startup code is smt like this:

      • establish zk connection
      • create root znodes in zk (/hbase)
      • create ephemeral node for master /hbase/master,

      Region server start up code is smt like this:

      • establish zk connection
      • check whether the root znode (/hbase) is there. If not, shutdown.
      • wait for the master to create znodes /hbase/master

      So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs.

      1. 5849v3.txt
        5 kB
        stack
      2. HBASE-5849_v1.patch
        1.0 kB
        Enis Soztutar
      3. HBASE-5849_v2.patch
        5 kB
        Enis Soztutar
      4. HBASE-5849_v4.patch
        5 kB
        Enis Soztutar
      5. HBASE-5849_v4.patch
        5 kB
        Enis Soztutar
      6. HBASE-5849_v4.patch
        5 kB
        Enis Soztutar
      7. HBASE-5849_v4-0.92.patch
        5 kB
        Enis Soztutar

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330119)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330118)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329562)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329548)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329530)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java

        stack :
        Files :

        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #106 (See https://builds.apache.org/job/HBase-0.92-security/106/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330119) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330118) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329562) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329548) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329530) Result = SUCCESS stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #184 (See https://builds.apache.org/job/HBase-TRUNK-security/184/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330116)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #184 (See https://builds.apache.org/job/HBase-TRUNK-security/184/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330116) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Enis Soztutar added a comment -

        Thanks all for pursuing this. From the failed Hudson builds:
        https://builds.apache.org/job/HBase-TRUNK-security/183/
        https://builds.apache.org/job/HBase-TRUNK/2811/testReport/
        https://builds.apache.org/job/HBase-0.92/390/
        https://builds.apache.org/job/HBase-0.94-security/21/
        None of the tests seem related.

        @Stack, for EvictionThread, I guess since the git repo is falling behind, I might not have your recent changes (I'm so lazy to checkout from svn). Although I saw also some other daemon threads (like a couple of IPC Client threads, etc). Let me dig into that later, and see if we can improve on that. I'll open another jira if I find anything interesting.

        Show
        Enis Soztutar added a comment - Thanks all for pursuing this. From the failed Hudson builds: https://builds.apache.org/job/HBase-TRUNK-security/183/ https://builds.apache.org/job/HBase-TRUNK/2811/testReport/ https://builds.apache.org/job/HBase-0.92/390/ https://builds.apache.org/job/HBase-0.94-security/21/ None of the tests seem related. @Stack, for EvictionThread, I guess since the git repo is falling behind, I might not have your recent changes (I'm so lazy to checkout from svn). Although I saw also some other daemon threads (like a couple of IPC Client threads, etc). Let me dig into that later, and see if we can improve on that. I'll open another jira if I find anything interesting.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #21 (See https://builds.apache.org/job/HBase-0.94-security/21/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330117)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #21 (See https://builds.apache.org/job/HBase-0.94-security/21/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330117) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #390 (See https://builds.apache.org/job/HBase-0.92/390/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330119)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330118)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #390 (See https://builds.apache.org/job/HBase-0.92/390/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330119) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330118) Result = FAILURE stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #148 (See https://builds.apache.org/job/HBase-0.94/148/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330117)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #148 (See https://builds.apache.org/job/HBase-0.94/148/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330117) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2811 (See https://builds.apache.org/job/HBase-TRUNK/2811/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330116)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2811 (See https://builds.apache.org/job/HBase-TRUNK/2811/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REAPPLY (Revision 1330116) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #183 (See https://builds.apache.org/job/HBase-TRUNK-security/183/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329560)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #183 (See https://builds.apache.org/job/HBase-TRUNK-security/183/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329560) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        stack added a comment -

        Applied to 0.92, 0.94, and trunk (took Ted's work for it that it works – thanks Ted). Thanks for the patch Enis and for digging in again.

        Show
        stack added a comment - Applied to 0.92, 0.94, and trunk (took Ted's work for it that it works – thanks Ted). Thanks for the patch Enis and for digging in again.
        Hide
        Lars Hofhansl added a comment -

        TestRegionRebalancing is unrelated (see HBASE-5848). TestReplication passes for me locally with v4 applied.

        Show
        Lars Hofhansl added a comment - TestRegionRebalancing is unrelated (see HBASE-5848 ). TestReplication passes for me locally with v4 applied.
        Hide
        stack added a comment -

        @Enis LruBlockCache.EvictionThread should be cleaned up on cluster shutdown? I thought I fixed that a day or so ago.

        Show
        stack added a comment - @Enis LruBlockCache.EvictionThread should be cleaned up on cluster shutdown? I thought I fixed that a day or so ago.
        Hide
        Ted Yu added a comment -

        I looped TestClusterBootOrder using patch v4 5 times and didn't see hanging test.

        Show
        Ted Yu added a comment - I looped TestClusterBootOrder using patch v4 5 times and didn't see hanging test.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12524099/HBASE-5849_v4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.TestRegionRebalancing
        org.apache.hadoop.hbase.replication.TestReplication

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524099/HBASE-5849_v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1637//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #20 (See https://builds.apache.org/job/HBase-0.94-security/20/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329561)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329528)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java

        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #20 (See https://builds.apache.org/job/HBase-0.94-security/20/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329561) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329528) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Enis Soztutar added a comment -

        Reattaching for Jenkins.

        Show
        Enis Soztutar added a comment - Reattaching for Jenkins.
        Hide
        Enis Soztutar added a comment -

        I have found 2 issues, that caused timeouts in 0.92 branch:
        1. hbase dir was not setup to use the temp dir under target/, but used the default one under /tmp/hadoop-$

        {username}

        , so running the test on 0.92 causes rs to not come up if you have dirty data under /tmp/.
        2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown properly. I could not inspect this further, there might be an issue with surefire.

        As a result, I updated the patch to first boot up a mini dfs, and setup the hbase dir. And I also removed the timeouts (the test runner (maven) will timeout instead if something goes wrong).

        All my tests for trunk,0.94, and 0.92 seem to pass.

        @Ted, @Stack, can you please try the patch to see whether you can replicate?

        On an unrelated note, the ResourceChecker notifies that some of the daemon threads (like LruBlockCache.EvictionThread) are not shutdown properly (even when using MiniHBaseCluster, and shutting down properly). Any idea, whether we should dig into that?

        Show
        Enis Soztutar added a comment - I have found 2 issues, that caused timeouts in 0.92 branch: 1. hbase dir was not setup to use the temp dir under target/, but used the default one under /tmp/hadoop-$ {username} , so running the test on 0.92 causes rs to not come up if you have dirty data under /tmp/. 2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown properly. I could not inspect this further, there might be an issue with surefire. As a result, I updated the patch to first boot up a mini dfs, and setup the hbase dir. And I also removed the timeouts (the test runner (maven) will timeout instead if something goes wrong). All my tests for trunk,0.94, and 0.92 seem to pass. @Ted, @Stack, can you please try the patch to see whether you can replicate? On an unrelated note, the ResourceChecker notifies that some of the daemon threads (like LruBlockCache.EvictionThread) are not shutdown properly (even when using MiniHBaseCluster, and shutting down properly). Any idea, whether we should dig into that?
        Hide
        stack added a comment -

        @Enis Agreed. I tried this before applying too.

        Show
        stack added a comment - @Enis Agreed. I tried this before applying too.
        Hide
        Enis Soztutar added a comment -

        Interesting that Hudson did not report any test failures. let me dig down to this.

        Show
        Enis Soztutar added a comment - Interesting that Hudson did not report any test failures. let me dig down to this.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #388 (See https://builds.apache.org/job/HBase-0.92/388/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329562)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #388 (See https://builds.apache.org/job/HBase-0.92/388/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329562) Result = SUCCESS stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #141 (See https://builds.apache.org/job/HBase-0.94/141/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329561)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #141 (See https://builds.apache.org/job/HBase-0.94/141/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329561) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2804 (See https://builds.apache.org/job/HBase-TRUNK/2804/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329560)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2804 (See https://builds.apache.org/job/HBase-TRUNK/2804/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available; REVERT (Revision 1329560) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #182 (See https://builds.apache.org/job/HBase-TRUNK-security/182/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329527)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #182 (See https://builds.apache.org/job/HBase-TRUNK-security/182/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329527) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        stack added a comment -

        Enis, might taking a look at this?

        Show
        stack added a comment - Enis, might taking a look at this?
        Hide
        stack added a comment -

        I killed all running builds in case they'd run into this hang.

        Show
        stack added a comment - I killed all running builds in case they'd run into this hang.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #387 (See https://builds.apache.org/job/HBase-0.92/387/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329548)

        Result = ABORTED
        stack :
        Files :

        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #387 (See https://builds.apache.org/job/HBase-0.92/387/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329548) Result = ABORTED stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        stack added a comment -

        Reopening. Backing out patch.

        Show
        stack added a comment - Reopening. Backing out patch.
        Hide
        stack added a comment -

        So, yes, I'm seeing what Ted reports above.

        Show
        stack added a comment - So, yes, I'm seeing what Ted reports above.
        Hide
        stack added a comment -

        I mean, it even passed hadoopqa above apart from my testing. Backing it out though... its ugly hang when it happens.

        Show
        stack added a comment - I mean, it even passed hadoopqa above apart from my testing. Backing it out though... its ugly hang when it happens.
        Hide
        stack added a comment -

        There is something wrong now. This test won't complete for me (though it has previous). I thought it the subsequent commit:

        ------------------------------------------------------------------------
        r1329555 | larsh | 2012-04-23 22:12:45 -0700 (Mon, 23 Apr 2012) | 1 line
        
        Refuse operations from Admin before master is initialized - fix for all branches
        

        ..that was bringing on the problem but removing that, its still not completing.

        I poked around in debugger and was getting an NPE in reportForDuty after master came up because this.hbaseMaster was null; we were failing allocating the Interface (hard to trace because toString would throw its on exception).

        For now backing this out.

        Show
        stack added a comment - There is something wrong now. This test won't complete for me (though it has previous). I thought it the subsequent commit: ------------------------------------------------------------------------ r1329555 | larsh | 2012-04-23 22:12:45 -0700 (Mon, 23 Apr 2012) | 1 line Refuse operations from Admin before master is initialized - fix for all branches ..that was bringing on the problem but removing that, its still not completing. I poked around in debugger and was getting an NPE in reportForDuty after master came up because this.hbaseMaster was null; we were failing allocating the Interface (hard to trace because toString would throw its on exception). For now backing this out.
        Hide
        stack added a comment -

        I tried it before committing and it passed then. I just tried it on trunk now:

        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        Running org.apache.hadoop.hbase.TestClusterBootOrder
        2012-04-23 21:27:45.213 java[97823:d007] Unable to load realm info from SCDynamicStore
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.727 sec
        
        Results :
        
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
        
        [INFO] 
        [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
        [INFO] Tests are skipped.
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 34.313s
        [INFO] Finished at: Mon Apr 23 21:28:02 PDT 2012
        [INFO] Final Memory: 21M/81M
        [INFO] ------------------------------------------------------------------------
        
        Show
        stack added a comment - I tried it before committing and it passed then. I just tried it on trunk now: ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.hbase.TestClusterBootOrder 2012-04-23 21:27:45.213 java[97823:d007] Unable to load realm info from SCDynamicStore Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.727 sec Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 34.313s [INFO] Finished at: Mon Apr 23 21:28:02 PDT 2012 [INFO] Final Memory: 21M/81M [INFO] ------------------------------------------------------------------------
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #139 (See https://builds.apache.org/job/HBase-0.94/139/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329528)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #139 (See https://builds.apache.org/job/HBase-0.94/139/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329528) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2803 (See https://builds.apache.org/job/HBase-TRUNK/2803/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329527)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2803 (See https://builds.apache.org/job/HBase-TRUNK/2803/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329527) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #386 (See https://builds.apache.org/job/HBase-0.92/386/)
        HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329530)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #386 (See https://builds.apache.org/job/HBase-0.92/386/ ) HBASE-5849 On first cluster startup, RS aborts if root znode is not available (Revision 1329530) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestClusterBootOrder.java
        Hide
        stack added a comment -

        Committed to 0.92, 0.94, and to trunk. Thanks for the patch Enis.

        Show
        stack added a comment - Committed to 0.92, 0.94, and to trunk. Thanks for the patch Enis.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12523905/5849v3.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1622//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523905/5849v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1622//console This message is automatically generated.
        Hide
        stack added a comment -

        Enis's v2 patch with this added to end of test:

        +  @org.junit.Rule
        +  public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu =
        +    new org.apache.hadoop.hbase.ResourceCheckerJUnitRule();
        

        Nice test.

        Show
        stack added a comment - Enis's v2 patch with this added to end of test: + @org.junit.Rule + public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu = + new org.apache.hadoop.hbase.ResourceCheckerJUnitRule(); Nice test.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12523888/HBASE-5849_v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523888/HBASE-5849_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1620//console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Rerunning hudson for patch v2.

        Show
        Enis Soztutar added a comment - Rerunning hudson for patch v2.
        Hide
        Enis Soztutar added a comment -

        Thanks Stack for taking a look into this. I have added a unit test for boot order for the cluster.

        To answer you earlier comment, I think the region server should just keep waiting until there is an active master.

        Show
        Enis Soztutar added a comment - Thanks Stack for taking a look into this. I have added a unit test for boot order for the cluster. To answer you earlier comment, I think the region server should just keep waiting until there is an active master.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12523865/HBASE-5849_v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
        org.apache.hadoop.hbase.util.TestProcessBasedCluster

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523865/HBASE-5849_v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.util.TestProcessBasedCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1617//console This message is automatically generated.
        Hide
        stack added a comment -

        Patch lgtm.

        Show
        stack added a comment - Patch lgtm.
        Hide
        Enis Soztutar added a comment -

        Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches.

        Tested this with pseudo-distributed setup on my laptop, by first launching regionserver, and observing that it does actually wait for the master to boot up, instead of aborting. I'll try to come up with a boot order unit test shortly.

        Show
        Enis Soztutar added a comment - Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches. Tested this with pseudo-distributed setup on my laptop, by first launching regionserver, and observing that it does actually wait for the master to boot up, instead of aborting. I'll try to come up with a boot order unit test shortly.
        Hide
        stack added a comment -

        Sounds good Enis. What should RS do then?

        Show
        stack added a comment - Sounds good Enis. What should RS do then?
        Hide
        Enis Soztutar added a comment -

        Upon inspecting further, it seems the patch for HBASE-4138 added the check for the base server at region server start code. While it makes sense to check for znode.parent from the client side, we should not do that for the regionserver.

        Show
        Enis Soztutar added a comment - Upon inspecting further, it seems the patch for HBASE-4138 added the check for the base server at region server start code. While it makes sense to check for znode.parent from the client side, we should not do that for the regionserver.

          People

          • Assignee:
            Enis Soztutar
            Reporter:
            Enis Soztutar
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development