HBase
  1. HBase
  2. HBASE-10333

Assignments are not retained on a cluster start

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.98.0, 0.96.1.1
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When a cluster is fully shutdown and then started up again with hbase.master.startup.retainassign set to true, I noticed that the assignments are not retained. Upon digging, it seems like HBASE-10101 made a change due to which the server holding the META previously is added to dead-servers (in HMaster.assignMeta). Later on, this makes the AssignmentManager think that the master recovered from a failure as opposed to a fresh cluster start (the ServerManager.deadServers list is not empty in the check within
      AssignmentManager.processDeadServersAndRegionsInTransition)

        Activity

        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #56 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/56/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558963)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #56 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/56/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558963) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #177 (See https://builds.apache.org/job/hbase-0.96-hadoop2/177/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558965)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #177 (See https://builds.apache.org/job/hbase-0.96-hadoop2/177/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558965) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #80 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/80/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #80 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/80/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4828 (See https://builds.apache.org/job/HBase-TRUNK/4828/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558963)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4828 (See https://builds.apache.org/job/HBase-TRUNK/4828/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558963) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96 #260 (See https://builds.apache.org/job/hbase-0.96/260/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558965)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96 #260 (See https://builds.apache.org/job/hbase-0.96/260/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558965) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #88 (See https://builds.apache.org/job/HBase-0.98/88/)
        HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #88 (See https://builds.apache.org/job/HBase-0.98/88/ ) HBASE-10333 Assignments are not retained on a cluster start (jxiang: rev 1558964) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Jimmy Xiang added a comment -

        Integrated into trunk, 0.98, and 0.96. Thanks.

        Show
        Jimmy Xiang added a comment - Integrated into trunk, 0.98, and 0.96. Thanks.
        Hide
        Jimmy Xiang added a comment -

        Thanks all for the review. I really want to have a unit test. However, the mini cluster can't retain the same region server ports on restart, so we can't check if region locations are retained. I will keep this in mind, and get back to it when I get a chance later on.

        Show
        Jimmy Xiang added a comment - Thanks all for the review. I really want to have a unit test. However, the mini cluster can't retain the same region server ports on restart, so we can't check if region locations are retained. I will keep this in mind, and get back to it when I get a chance later on.
        Hide
        Devaraj Das added a comment -

        +1 based on my testing. One consideration - should we have a unit test that checks this behavior.

        Show
        Devaraj Das added a comment - +1 based on my testing. One consideration - should we have a unit test that checks this behavior.
        Hide
        Andrew Purtell added a comment -

        +1 for 0.98

        Show
        Andrew Purtell added a comment - +1 for 0.98
        Hide
        Matteo Bertozzi added a comment -

        +1 looks good to me

        Show
        Matteo Bertozzi added a comment - +1 looks good to me
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12623179/hbase-10333.patch
        against trunk revision .
        ATTACHMENT ID: 12623179

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623179/hbase-10333.patch against trunk revision . ATTACHMENT ID: 12623179 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8436//console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Jimmy Xiang, thanks. Will revert back soon.

        Show
        Devaraj Das added a comment - Jimmy Xiang , thanks. Will revert back soon.
        Hide
        Jimmy Xiang added a comment -

        Devaraj Das, Jeffrey Zhong, could you please take a look the patch? I tested on my cluster and the assignments are retained with the patch. I tried to add a unit test but the mini cluster doesn't use the same ports at restart.

        Show
        Jimmy Xiang added a comment - Devaraj Das , Jeffrey Zhong , could you please take a look the patch? I tested on my cluster and the assignments are retained with the patch. I tried to add a unit test but the mini cluster doesn't use the same ports at restart.
        Hide
        Devaraj Das added a comment -

        This issue is there in 0.96.1+

        I think the issue is guaranteed to happen in those cluster-restart cases where the node hosting the meta previously is not there in the restarted cluster, or the regionserver process comes up on a different port on the same node after the restart.

        Show
        Devaraj Das added a comment - This issue is there in 0.96.1+ I think the issue is guaranteed to happen in those cluster-restart cases where the node hosting the meta previously is not there in the restarted cluster, or the regionserver process comes up on a different port on the same node after the restart.
        Hide
        Jimmy Xiang added a comment -

        Yes, 0.94 is safe. The issue happens only in 0.96+, and when the meta is in transition while the cluster is restarted.

        Show
        Jimmy Xiang added a comment - Yes, 0.94 is safe. The issue happens only in 0.96+, and when the meta is in transition while the cluster is restarted.
        Hide
        Vladimir Rodionov added a comment -

        This was introduced in 0.96, I presume? So, we are safe in 0.94?

        Show
        Vladimir Rodionov added a comment - This was introduced in 0.96, I presume? So, we are safe in 0.94?
        Hide
        Jimmy Xiang added a comment -

        Let me take a look.

        Show
        Jimmy Xiang added a comment - Let me take a look.

          People

          • Assignee:
            Jimmy Xiang
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development