HBase
  1. HBase
  2. HBASE-6381

AssignmentManager should use the same logic for clean startup and failover

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently AssignmentManager handles clean startup and failover very differently.
      Different logic is mingled together so it is hard to find out which is for which.

      We should clean it up and share the same logic so that AssignmentManager handles
      both cases the same way. This way, the code will much easier to understand and
      maintain.

      1. trunk-6381_v9.patch
        137 kB
        Jimmy Xiang
      2. trunk-6381_v8.patch
        138 kB
        Jimmy Xiang
      3. trunk-6381_v7.patch
        137 kB
        Jimmy Xiang
      4. trunk-6381_v5.patch
        122 kB
        Jimmy Xiang
      5. hbase-6381-notes.pdf
        172 kB
        Jimmy Xiang
      6. hbase-6381.pdf
        141 kB
        Jimmy Xiang

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #191 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/191/)
          HBASE-6381 AssignmentManager should use the same logic for clean startup and failover (Revision 1389561)

          Result = FAILURE
          jxiang :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #191 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/191/ ) HBASE-6381 AssignmentManager should use the same logic for clean startup and failover (Revision 1389561) Result = FAILURE jxiang : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3375 (See https://builds.apache.org/job/HBase-TRUNK/3375/)
          HBASE-6381 AssignmentManager should use the same logic for clean startup and failover (Revision 1389561)

          Result = FAILURE
          jxiang :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3375 (See https://builds.apache.org/job/HBase-TRUNK/3375/ ) HBASE-6381 AssignmentManager should use the same logic for clean startup and failover (Revision 1389561) Result = FAILURE jxiang : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          Hide
          Jimmy Xiang added a comment -

          Integrated into trunk. Thanks all for the review.

          Show
          Jimmy Xiang added a comment - Integrated into trunk. Thanks all for the review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546226/trunk-6381_v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 140 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546226/trunk-6381_v9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 140 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2924//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          The latest patch addressed Rajesh's comments. Thanks Ram for reviewing the fix for that.

          Will commit it tomorrow if no objection.

          Show
          Jimmy Xiang added a comment - The latest patch addressed Rajesh's comments. Thanks Ram for reviewing the fix for that. Will commit it tomorrow if no objection.
          Hide
          Jimmy Xiang added a comment -

          @Ram, thanks a lot for the review. As to the removal from regionInfoIterator, it is used locally to track unassigned regions so that we know if the bulk assign is completed so I think it is needed.

          As to HBASE-6228, I added the following to the fixupDaughters in HMaster before actually fixing anything in order to avoid duplicated processing:

                if (!serverManager.isServerDead(sn)) { // Otherwise, let SSH take care of it
          

          Here sn is the parent server name. So the chance of duplicated processing should be very minimal, right?

          Show
          Jimmy Xiang added a comment - @Ram, thanks a lot for the review. As to the removal from regionInfoIterator, it is used locally to track unassigned regions so that we know if the bulk assign is completed so I think it is needed. As to HBASE-6228 , I added the following to the fixupDaughters in HMaster before actually fixing anything in order to avoid duplicated processing: if (!serverManager.isServerDead(sn)) { // Otherwise, let SSH take care of it Here sn is the parent server name. So the chance of duplicated processing should be very minimal, right?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          The fix for Rajesh's comment seems valid. I only have 2 questions
          ->Will these changes solve HBASE-6228 or do we still need to add some synchronization while fixupdaughters?
          ->In GeneralBulkAssigner

                while (regionInfoIterator.hasNext()) {
                  HRegionInfo hri = regionInfoIterator.next();
                  RegionState state = regionStates.getRegionState(hri);
                  if ((!regionStates.isRegionInTransition(hri) && regionStates.isRegionAssigned(hri))
                      || state.isSplit() || state.isSplitting()) {
                    regionInfoIterator.remove();
          

          This removal from regionInfoIterator may not be needed. Anyway SSH is handling this case. And also as part of HBASE-6317 EnableTableHandler will handle RIT regions and already assigned region.
          In CreateTable this problem should not happen. So we can remove this piece of code from GeneralBulkAssigner? what you feel?

          Other than that I am +1. The ZKTable change can be done in a new JIRA as you said.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy The fix for Rajesh's comment seems valid. I only have 2 questions ->Will these changes solve HBASE-6228 or do we still need to add some synchronization while fixupdaughters? ->In GeneralBulkAssigner while (regionInfoIterator.hasNext()) { HRegionInfo hri = regionInfoIterator.next(); RegionState state = regionStates.getRegionState(hri); if ((!regionStates.isRegionInTransition(hri) && regionStates.isRegionAssigned(hri)) || state.isSplit() || state.isSplitting()) { regionInfoIterator.remove(); This removal from regionInfoIterator may not be needed. Anyway SSH is handling this case. And also as part of HBASE-6317 EnableTableHandler will handle RIT regions and already assigned region. In CreateTable this problem should not happen. So we can remove this piece of code from GeneralBulkAssigner? what you feel? Other than that I am +1. The ZKTable change can be done in a new JIRA as you said.
          Hide
          Jimmy Xiang added a comment -

          @Rajesh, good catch! I changed the corresponding code as below:

              if (deadServers != null) {
                for (Map.Entry<ServerName, List<HRegionInfo>> server: deadServers.entrySet()) {
                  ServerName serverName = server.getKey();
                  if (!serverManager.isServerDead(serverName)) {
                    serverManager.expireServer(serverName); // Let SSH do region re-assign
                  }
                }
              }
              nodes = ZKUtil.listChildrenAndWatchForNewChildren(
                this.watcher, this.watcher.assignmentZNode);
              if (!nodes.isEmpty()) {
                for (String encodedRegionName : nodes) {
                  processRegionInTransition(encodedRegionName, null, deadServers);
                }
              }
          ....
              failoverCleanupDone();
          

          Any other issues? Any place we can enhance more? We can use RB if you prefer.

          Thanks.

          Show
          Jimmy Xiang added a comment - @Rajesh, good catch! I changed the corresponding code as below: if (deadServers != null) { for (Map.Entry<ServerName, List<HRegionInfo>> server: deadServers.entrySet()) { ServerName serverName = server.getKey(); if (!serverManager.isServerDead(serverName)) { serverManager.expireServer(serverName); // Let SSH do region re-assign } } } nodes = ZKUtil.listChildrenAndWatchForNewChildren( this.watcher, this.watcher.assignmentZNode); if (!nodes.isEmpty()) { for (String encodedRegionName : nodes) { processRegionInTransition(encodedRegionName, null, deadServers); } } .... failoverCleanupDone(); Any other issues? Any place we can enhance more? We can use RB if you prefer. Thanks.
          Hide
          rajeshbabu added a comment -

          @Jimmy,
          In the following scenario region assignment may not happen with the latest patch:

          1)lets suppose a region R1 is moving from RS1 to RS2
          2)if the master and RS1 restarted before update server info for R1 with RS2 in META.(during region open in RS)
          3)in rebuild user regions we will select R1 as dead region on dead server RS1.
          4)Now server info updated in META with RS2.
          5)In processDeadServersAndRecoverLostRegions we will expiry server and delete znode of the region.

                  if (!serverManager.isServerDead(serverName)) {
                    serverManager.expireServer(serverName); // Let SSH do region re-assign
                  }
                  if (!nodes.isEmpty()) {
                    for (HRegionInfo deadRegion : server.getValue()) {
                      String encodedName = deadRegion.getEncodedName();
                      if (nodes.remove(encodedName)) {
                        ZKAssign.deleteNodeFailSilent(watcher, deadRegion);
                      }
                    }
                  }
          

          6)if the znode deletion happened before transitioning to opened,then the region wont be online.

                if (!transitionToOpened(region)) {
                  // If we fail to transition to opened, it's because of one of two cases:
                  //    (a) we lost our ZK lease
                  // OR (b) someone else opened the region before us
                  // In either case, we don't need to transition to FAILED_OPEN state.
                  // In case (a), the Master will process us as a dead server. In case
                  // (b) the region is already being handled elsewhere anyway.
                  cleanupFailedOpen(region);
                  return;
                }
          

          Even while processing SSH of RS1 also we wont assign it because in META server info got changed to RS2.

          Please correct me if wrong.

          Show
          rajeshbabu added a comment - @Jimmy, In the following scenario region assignment may not happen with the latest patch: 1)lets suppose a region R1 is moving from RS1 to RS2 2)if the master and RS1 restarted before update server info for R1 with RS2 in META.(during region open in RS) 3)in rebuild user regions we will select R1 as dead region on dead server RS1. 4)Now server info updated in META with RS2. 5)In processDeadServersAndRecoverLostRegions we will expiry server and delete znode of the region. if (!serverManager.isServerDead(serverName)) { serverManager.expireServer(serverName); // Let SSH do region re-assign } if (!nodes.isEmpty()) { for (HRegionInfo deadRegion : server.getValue()) { String encodedName = deadRegion.getEncodedName(); if (nodes.remove(encodedName)) { ZKAssign.deleteNodeFailSilent(watcher, deadRegion); } } } 6)if the znode deletion happened before transitioning to opened,then the region wont be online. if (!transitionToOpened(region)) { // If we fail to transition to opened, it's because of one of two cases: // (a) we lost our ZK lease // OR (b) someone else opened the region before us // In either case , we don't need to transition to FAILED_OPEN state. // In case (a), the Master will process us as a dead server. In case // (b) the region is already being handled elsewhere anyway. cleanupFailedOpen(region); return ; } Even while processing SSH of RS1 also we wont assign it because in META server info got changed to RS2. Please correct me if wrong.
          Hide
          stack added a comment -

          You OK w/ this last patch ramkrishna.s.vasudevan?

          Show
          stack added a comment - You OK w/ this last patch ramkrishna.s.vasudevan ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12545766/trunk-6381_v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 140 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 13 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestFromClientSide

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545766/trunk-6381_v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 140 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 13 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2899//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Thanks for the review. I will fix the comment before I commit.
          I will wait for hadoopqa and give other people couple days to review as well.

          Show
          Jimmy Xiang added a comment - Thanks for the review. I will fix the comment before I commit. I will wait for hadoopqa and give other people couple days to review as well.
          Hide
          stack added a comment -

          Whats different Jimmy? You putting this up on RB?

          Show
          stack added a comment - Whats different Jimmy? You putting this up on RB?
          Hide
          Jimmy Xiang added a comment -

          Need to rebase.

          Show
          Jimmy Xiang added a comment - Need to rebase.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12545590/trunk-6381_v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2890//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545590/trunk-6381_v7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2890//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Uploaded the new rebased patch + minor changes to RB: https://reviews.apache.org/r/6535/

          Show
          Jimmy Xiang added a comment - Uploaded the new rebased patch + minor changes to RB: https://reviews.apache.org/r/6535/
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543340/hbase-6381.pdf
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2757//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543340/hbase-6381.pdf against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2757//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543121/trunk-6381_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 111 warning messages.

          -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

          -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543121/trunk-6381_v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 111 warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2739//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Attached is the diff 3 on RB https://reviews.apache.org/r/6535/, rebased to the latest of trunk as today.

          Will work on some doc later.

          Show
          Jimmy Xiang added a comment - Attached is the diff 3 on RB https://reviews.apache.org/r/6535/ , rebased to the latest of trunk as today. Will work on some doc later.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Reviewed little remaining tomorrow will check this.

          Show
          ramkrishna.s.vasudevan added a comment - Reviewed little remaining tomorrow will check this.
          Hide
          Jimmy Xiang added a comment -

          @Ram, please review it too. It's better to catch issues before committing.

          Show
          Jimmy Xiang added a comment - @Ram, please review it too. It's better to catch issues before committing.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Last 3 days i was not in town. If Stack has completed its review then no prob, pls go ahead and commit. If not i will check this today. Thanks Jimmy and Stack.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Last 3 days i was not in town. If Stack has completed its review then no prob, pls go ahead and commit. If not i will check this today. Thanks Jimmy and Stack.
          Hide
          Jimmy Xiang added a comment -

          Mostly, it is just addressing your comments

          Show
          Jimmy Xiang added a comment - Mostly, it is just addressing your comments
          Hide
          stack added a comment -

          Whats in v3 that is not in v2? Does it address comments?

          Show
          stack added a comment - Whats in v3 that is not in v2? Does it address comments?
          Hide
          Jimmy Xiang added a comment -

          Patch v3 was uploaded to RB (diff 2).

          Show
          Jimmy Xiang added a comment - Patch v3 was uploaded to RB (diff 2).
          Hide
          Jimmy Xiang added a comment -

          Here is the review request: https://reviews.apache.org/r/6535/

          Show
          Jimmy Xiang added a comment - Here is the review request: https://reviews.apache.org/r/6535/
          Hide
          Jimmy Xiang added a comment -

          Attached some notes about this patch.

          Show
          Jimmy Xiang added a comment - Attached some notes about this patch.
          Hide
          stack added a comment -

          @Jimmy Any chance of a page of 'design' outlining what your patch does and how you think it all should run? Will help reviewing these big patches. It will also help going forward in that we can refer to your intent when unsure about how to proceed. Does that make sense? Thanks.

          Show
          stack added a comment - @Jimmy Any chance of a page of 'design' outlining what your patch does and how you think it all should run? Will help reviewing these big patches. It will also help going forward in that we can refer to your intent when unsure about how to proceed. Does that make sense? Thanks.
          Hide
          Jimmy Xiang added a comment -

          Another change I'd like to do is to share the same bulk assign code. AM has several bulk assign functions: for startup bulk assign, for SSH bulk assign, etc. They can be shared.

          Show
          Jimmy Xiang added a comment - Another change I'd like to do is to share the same bulk assign code. AM has several bulk assign functions: for startup bulk assign, for SSH bulk assign, etc. They can be shared.
          Hide
          Jimmy Xiang added a comment -

          @Ram, currently, I was thinking about three changes. I need to make sure they are good.
          1. for the failover case, let SSH take case of those dead region servers so that we can share some code instead of doing some similar things in AM and SSH.
          2. SSH is enabled so that we can handle meta/root region server failure before joinCluster is completed. However, we can hold SSH a little bit
          for the user region assignments. So in AM, we can avoid failoverProcessedRegions which is a little bit confusing.
          3. those enablingTables and disablingTables, they should be some local variables. We can load them from ZKTable at the beginning instead of
          handling them per table.

          Show
          Jimmy Xiang added a comment - @Ram, currently, I was thinking about three changes. I need to make sure they are good. 1. for the failover case, let SSH take case of those dead region servers so that we can share some code instead of doing some similar things in AM and SSH. 2. SSH is enabled so that we can handle meta/root region server failure before joinCluster is completed. However, we can hold SSH a little bit for the user region assignments. So in AM, we can avoid failoverProcessedRegions which is a little bit confusing. 3. those enablingTables and disablingTables, they should be some local variables. We can load them from ZKTable at the beginning instead of handling them per table.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Just curious to know.. Can you tel what idea you have to solve this?

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Just curious to know.. Can you tel what idea you have to solve this?
          Hide
          Jimmy Xiang added a comment -

          Good catch. I see. That's right. Thanks!

          Show
          Jimmy Xiang added a comment - Good catch. I see. That's right. Thanks!
          Hide
          stack added a comment -

          Now during rebuildUserRegions, the rs is online but gets no region assigned.

          Is this so? The restarted RS will have a different startcode/ServerName?

          I think we should check if a rs is online, and the regions are opened on the rs too.

          Probably no harm. What if a 1k cluster w/ each server carrying hundreds of regions. What you think? It might take a little while doing this step. Would that be a prob?

          Show
          stack added a comment - Now during rebuildUserRegions, the rs is online but gets no region assigned. Is this so? The restarted RS will have a different startcode/ServerName? I think we should check if a rs is online, and the regions are opened on the rs too. Probably no harm. What if a 1k cluster w/ each server carrying hundreds of regions. What you think? It might take a little while doing this step. Would that be a prob?
          Hide
          Jimmy Xiang added a comment -

          If RS that was online dies during rebuildUserRegions, SSH will take care of it. In the scenario I described, it is like this: master dies, then rs dies,
          then rs starts up, then master fails over/starts up. Now during rebuildUserRegions, the rs is online but gets no region assigned. I think we should check if a rs is online, and the regions are opened on the rs too.

          Show
          Jimmy Xiang added a comment - If RS that was online dies during rebuildUserRegions, SSH will take care of it. In the scenario I described, it is like this: master dies, then rs dies, then rs starts up, then master fails over/starts up. Now during rebuildUserRegions, the rs is online but gets no region assigned. I think we should check if a rs is online, and the regions are opened on the rs too.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Correct me if am wrong. While doing rebuildUserRegions if the RS that was online dies, any way the SSH will be able to assign it again by going thro the META.
          If the RS went down before starting with rebuildUserRegions it will not consider the RS as an alive RS.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Correct me if am wrong. While doing rebuildUserRegions if the RS that was online dies, any way the SSH will be able to assign it again by going thro the META. If the RS went down before starting with rebuildUserRegions it will not consider the RS as an alive RS.
          Hide
          Jimmy Xiang added a comment -

          Looked AM.rebuildUserRegions(), it thinks a region is online if the region server is online. This is not reliable. For example, a master dies, then a region server dies. At this time SSH is not running. Now, master fails over. Now all the regions in the dead server won't be assigned. Is that right?

          Show
          Jimmy Xiang added a comment - Looked AM.rebuildUserRegions(), it thinks a region is online if the region server is online. This is not reliable. For example, a master dies, then a region server dies. At this time SSH is not running. Now, master fails over. Now all the regions in the dead server won't be assigned. Is that right?
          Hide
          Jimmy Xiang added a comment -

          I checked the 0.89-fb branch, but not sure which patch is related. It may be very helpful to know how they did it. It is bad to see 0.89-fb diverge from trunk more and more.

          Show
          Jimmy Xiang added a comment - I checked the 0.89-fb branch, but not sure which patch is related. It may be very helpful to know how they did it. It is bad to see 0.89-fb diverge from trunk more and more.
          Hide
          stack added a comment -

          Agree. Talking to Prakash recently of our fb breathern, he talked of how in 89-fb, they've done work so there is little diff between the two cases.... on failover master, they just start up and do fixup on assign side (IIUC). Need to talk to the lads again to get a better picture.

          Show
          stack added a comment - Agree. Talking to Prakash recently of our fb breathern, he talked of how in 89-fb, they've done work so there is little diff between the two cases.... on failover master, they just start up and do fixup on assign side (IIUC). Need to talk to the lads again to get a better picture.
          Hide
          Jimmy Xiang added a comment -

          HBASE-6272 is the first step.

          Show
          Jimmy Xiang added a comment - HBASE-6272 is the first step.

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development