HBase
  1. HBase
  2. HBASE-5918

Master will block forever at startup if root server dies between assigning root and assigning meta

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.1
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When master is initializing, if root server died between assign root and assign meta, master will block at
      HMaster#assignRootAndMeta:

      assignmentManager.assignMeta();
      this.catalogTracker.waitForMeta();

      because ServerShutdownHandler is disabled,

      So we should enable ServerShutdownHandler after called assignmentManager.assignMeta();

      1. HBASE-5918_0.94
        2 kB
        ramkrishna.s.vasudevan
      2. HBASE-5918.patch
        0.7 kB
        chunhui shen
      3. HBASE-5918.patch
        0.7 kB
        chunhui shen
      4. HBASE-5918V2.patch
        2 kB
        chunhui shen

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #37 (See https://builds.apache.org/job/HBase-0.94-security/37/)
        HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta

        Submitted by:Chunhui
        Reviewed by:Stack, Ted, Ram (Revision 1352229)

        Result = SUCCESS
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #37 (See https://builds.apache.org/job/HBase-0.94-security/37/ ) HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta Submitted by:Chunhui Reviewed by:Stack, Ted, Ram (Revision 1352229) Result = SUCCESS ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #61 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/61/)
        HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta (Chunhui) (Revision 1352161)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #61 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/61/ ) HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta (Chunhui) (Revision 1352161) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #263 (See https://builds.apache.org/job/HBase-0.94/263/)
        HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta

        Submitted by:Chunhui
        Reviewed by:Stack, Ted, Ram (Revision 1352229)

        Result = SUCCESS
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #263 (See https://builds.apache.org/job/HBase-0.94/263/ ) HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta Submitted by:Chunhui Reviewed by:Stack, Ted, Ram (Revision 1352229) Result = SUCCESS ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        ramkrishna.s.vasudevan added a comment -

        Resolving after committing it to 0.94.

        Show
        ramkrishna.s.vasudevan added a comment - Resolving after committing it to 0.94.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Patch for 0.94.

        Show
        ramkrishna.s.vasudevan added a comment - Patch for 0.94.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3048 (See https://builds.apache.org/job/HBase-TRUNK/3048/)
        HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta (Chunhui) (Revision 1352161)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3048 (See https://builds.apache.org/job/HBase-TRUNK/3048/ ) HBASE-5918 Master will block forever at startup if root server dies between assigning root and assigning meta (Chunhui) (Revision 1352161) Result = SUCCESS tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Ted Yu added a comment -

        Patch v2 integrated to trunk.

        Thanks for the patch, Chunhui.

        Thanks for the review, Stack and Ram.

        Show
        Ted Yu added a comment - Patch v2 integrated to trunk. Thanks for the patch, Chunhui. Thanks for the review, Stack and Ram.
        Hide
        ramkrishna.s.vasudevan added a comment -

        This issue needs to be committed. There are few more like this that needs to be integrated I feel.

        Show
        ramkrishna.s.vasudevan added a comment - This issue needs to be committed. There are few more like this that needs to be integrated I feel.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525408/HBASE-5918V2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525408/HBASE-5918V2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//console This message is automatically generated.
        Hide
        chunhui shen added a comment -

        In the v2 patch, I make a single method that set the flag and then did the call to expireDeadNotExpiredServers. And we will only call this method once now.

        Show
        chunhui shen added a comment - In the v2 patch, I make a single method that set the flag and then did the call to expireDeadNotExpiredServers. And we will only call this method once now.
        Hide
        stack added a comment -

        @Chunhui I suppose so.... Its just odd have flags set in such different locations... It makes the tracking of stuff difficult. At a minimum I'd think we'd make a single method that set the flag and then did the call to expireDeadNotExpiredServers so they are grouped...

        What about the call to expireDeadNotExpiredServers that is done twice? On first call, we'd process possibly the server that was carrying root. What happens when we call it again later out in finishInitialization? Could we end up processing same server twice at all?

        Thanks.

        Show
        stack added a comment - @Chunhui I suppose so.... Its just odd have flags set in such different locations... It makes the tracking of stuff difficult. At a minimum I'd think we'd make a single method that set the flag and then did the call to expireDeadNotExpiredServers so they are grouped... What about the call to expireDeadNotExpiredServers that is done twice? On first call, we'd process possibly the server that was carrying root. What happens when we call it again later out in finishInitialization? Could we end up processing same server twice at all? Thanks.
        Hide
        chunhui shen added a comment -

        Shouldn't we remove the setting of this flag that happens later in finishInitialization?

        In the patch, the setting of this flag is in a if block, so we should keep setting of this flag that happens later in finishInitialization.

        Show
        chunhui shen added a comment - Shouldn't we remove the setting of this flag that happens later in finishInitialization? In the patch, the setting of this flag is in a if block, so we should keep setting of this flag that happens later in finishInitialization.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Even HBASE-5916 is also due to this. I will try on working on a testcase.

        Show
        ramkrishna.s.vasudevan added a comment - Even HBASE-5916 is also due to this. I will try on working on a testcase.
        Hide
        stack added a comment -

        Shouldn't we remove the setting of this flag that happens later in finishInitialization?

        +1 on the patch otherwise. This is good stuff.

        Any chance of a test? It looks like it'd be hard to get one in here but it be good if you fellas at least said why a test is hard to squeeze in here to show you at least tried figuring how to test this stuff. The flag disabling shutdown handler was only added recently but here we find an issue w/ it already.

        Author: Michael Stack <stack@apache.org>  2012-03-13 08:35:54
        Committer: Michael Stack <stack@apache.org>  2012-03-13 08:35:54
        Parent: fbd4bebd5cca129f49e91ec9936f604998a7025a (HBASE-5314 racefully rolling restart region servers in rolling-restart.sh)
        Child:  59e5460807a1dc0fb5763e4b12dda4be49ef3bb4 (HBASE-5574 DEFAULT_MAX_FILE_SIZE defaults to a negative value)
        Branches: 094.testfail, 5833trunk, hanging, pbwork, remotes/origin/instant_schema_alter, remotes/origin/trunk, v10, v4, v6
        Follows: 
        Precedes: 
        
            HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
            
            git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1300194 13f79535-47bb-0310-9956-ffa450edef68
        
        ----- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java -----
        inde
        
        Show
        stack added a comment - Shouldn't we remove the setting of this flag that happens later in finishInitialization? +1 on the patch otherwise. This is good stuff. Any chance of a test? It looks like it'd be hard to get one in here but it be good if you fellas at least said why a test is hard to squeeze in here to show you at least tried figuring how to test this stuff. The flag disabling shutdown handler was only added recently but here we find an issue w/ it already. Author: Michael Stack <stack@apache.org> 2012-03-13 08:35:54 Committer: Michael Stack <stack@apache.org> 2012-03-13 08:35:54 Parent: fbd4bebd5cca129f49e91ec9936f604998a7025a (HBASE-5314 racefully rolling restart region servers in rolling-restart.sh) Child: 59e5460807a1dc0fb5763e4b12dda4be49ef3bb4 (HBASE-5574 DEFAULT_MAX_FILE_SIZE defaults to a negative value) Branches: 094.testfail, 5833trunk, hanging, pbwork, remotes/origin/instant_schema_alter, remotes/origin/trunk, v10, v4, v6 Follows: Precedes: HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler git-svn-id: https: //svn.apache.org/repos/asf/hbase/trunk@1300194 13f79535-47bb-0310-9956-ffa450edef68 ----- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ----- inde
        Hide
        ramkrishna.s.vasudevan added a comment -

        +1 on patch.

        Show
        ramkrishna.s.vasudevan added a comment - +1 on patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525381/HBASE-5918.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525381/HBASE-5918.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1736//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        @Chunhui:
        I used script to analyze https://builds.apache.org/job/PreCommit-HBASE-Build/1720/console but didn't find hanging test.
        You may need to run through test suite yourself so that you can find out which test hangs.

        Show
        Ted Yu added a comment - @Chunhui: I used script to analyze https://builds.apache.org/job/PreCommit-HBASE-Build/1720/console but didn't find hanging test. You may need to run through test suite yourself so that you can find out which test hangs.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525282/HBASE-5918.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525282/HBASE-5918.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1720//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Patch makes sense.

        Show
        Ted Yu added a comment - Patch makes sense.

          People

          • Assignee:
            chunhui shen
            Reporter:
            chunhui shen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development