HBase
  1. HBase
  2. HBASE-6611

Forcing region state offline cause double assignment

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In assigning a region, assignment manager forces the region state offline if it is not. This could cause double assignment, for example, if the region is already assigned and in the Open state, you should not just change it's state to Offline, and assign it again.

      I think this could be the root cause for all double assignments IF the region state is reliable.

      After this loophole is closed, TestHBaseFsck should come up a different way to create some assignment inconsistencies, for example, calling region server to open a region directly.

      1. trunk-6611_v5.patch
        228 kB
        Jimmy Xiang
      2. trunk-6611_v2.patch
        135 kB
        Jimmy Xiang

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          stack added a comment -

          Yah!!!

          Show
          stack added a comment - Yah!!!
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #229 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/229/)
          HBASE-6611 Forcing region state offline cause double assignment (Revision 1400358)

          Result = FAILURE
          jxiang :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/RegionTransition.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/OfflineCallback.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
          • /hbase/trunk/hbase-server/src/main/protobuf/Admin.proto
          • /hbase/trunk/hbase-server/src/main/protobuf/ZooKeeper.proto
          • /hbase/trunk/hbase-server/src/main/ruby/shell/commands/assign.rb
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #229 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/229/ ) HBASE-6611 Forcing region state offline cause double assignment (Revision 1400358) Result = FAILURE jxiang : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/RegionTransition.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/OfflineCallback.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionState.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java /hbase/trunk/hbase-server/src/main/protobuf/Admin.proto /hbase/trunk/hbase-server/src/main/protobuf/ZooKeeper.proto /hbase/trunk/hbase-server/src/main/ruby/shell/commands/assign.rb /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3466 (See https://builds.apache.org/job/HBase-TRUNK/3466/)
          HBASE-6611 Forcing region state offline cause double assignment (Revision 1400358)

          Result = FAILURE
          jxiang :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/RegionTransition.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/OfflineCallback.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
          • /hbase/trunk/hbase-server/src/main/protobuf/Admin.proto
          • /hbase/trunk/hbase-server/src/main/protobuf/ZooKeeper.proto
          • /hbase/trunk/hbase-server/src/main/ruby/shell/commands/assign.rb
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3466 (See https://builds.apache.org/job/HBase-TRUNK/3466/ ) HBASE-6611 Forcing region state offline cause double assignment (Revision 1400358) Result = FAILURE jxiang : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/RegionTransition.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/OfflineCallback.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionState.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java /hbase/trunk/hbase-server/src/main/protobuf/Admin.proto /hbase/trunk/hbase-server/src/main/protobuf/ZooKeeper.proto /hbase/trunk/hbase-server/src/main/ruby/shell/commands/assign.rb /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
          Hide
          Jimmy Xiang added a comment -

          Integrated into trunk. TestHBaseFsck is fine locally.

          Show
          Jimmy Xiang added a comment - Integrated into trunk. TestHBaseFsck is fine locally.
          Hide
          Jimmy Xiang added a comment -

          Try hadoop qa again.

          Show
          Jimmy Xiang added a comment - Try hadoop qa again.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550009/trunk-6611_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 82 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.util.TestHBaseFsck

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550009/trunk-6611_v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 82 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3089//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          This is latest patch in RB rebased to trunk latest.

          Will commit to trunk during the weekend if no objection.

          Thanks all for the review.

          Show
          Jimmy Xiang added a comment - This is latest patch in RB rebased to trunk latest. Will commit to trunk during the weekend if no objection. Thanks all for the review.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Now i am also getting in sync with AM code in trunk..Thanks to Jimmy for making it more reliable.

          Show
          ramkrishna.s.vasudevan added a comment - Now i am also getting in sync with AM code in trunk..Thanks to Jimmy for making it more reliable.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Comments on review board. Mainly wrt to some scenarios that are likely to happen.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Comments on review board. Mainly wrt to some scenarios that are likely to happen.
          Hide
          Jimmy Xiang added a comment -

          @Stack, @Ram, are we ok with the latest patch?

          Show
          Jimmy Xiang added a comment - @Stack, @Ram, are we ok with the latest patch?
          Hide
          Jimmy Xiang added a comment -

          Those still open (not fixed or drop) are not addressed.

          Show
          Jimmy Xiang added a comment - Those still open (not fixed or drop) are not addressed.
          Hide
          stack added a comment -

          Which did you not address?

          Show
          stack added a comment - Which did you not address?
          Hide
          Jimmy Xiang added a comment -

          Patch version 4.1 is uploaded to RB: https://reviews.apache.org/r/7305/
          I have addressed most of the review comments.

          Show
          Jimmy Xiang added a comment - Patch version 4.1 is uploaded to RB: https://reviews.apache.org/r/7305/ I have addressed most of the review comments.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Over the week end i will surely check this. Thanks for your awesome work Jimmy

          Show
          ramkrishna.s.vasudevan added a comment - Over the week end i will surely check this. Thanks for your awesome work Jimmy
          Hide
          Jimmy Xiang added a comment -

          Thanks for the review. I posted the fourth patch on RB: https://reviews.apache.org/r/7305/
          I am very confident at it now. AM's getting stable and reliable. It has several enhancements mentioned on RB.

          I also filed two followup issues: HBASE-6976 and HBASE-6977.

          Show
          Jimmy Xiang added a comment - Thanks for the review. I posted the fourth patch on RB: https://reviews.apache.org/r/7305/ I am very confident at it now. AM's getting stable and reliable. It has several enhancements mentioned on RB. I also filed two followup issues: HBASE-6976 and HBASE-6977 .
          Hide
          ramkrishna.s.vasudevan added a comment -

          Some comments on RB Jimmy. Thanks

          Show
          ramkrishna.s.vasudevan added a comment - Some comments on RB Jimmy. Thanks
          Hide
          Jimmy Xiang added a comment -

          I cannot use the separate ZK watcher to watch znode change.

          Show
          Jimmy Xiang added a comment - I cannot use the separate ZK watcher to watch znode change.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546932/trunk-6611_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 140 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546932/trunk-6611_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 140 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2955//console This message is automatically generated.
          Hide
          Jimmy Xiang added a comment -

          Uploaded patch 2 to RB: https://reviews.apache.org/r/7305/
          It addressed the deadlock issue I mentioned above.

          Show
          Jimmy Xiang added a comment - Uploaded patch 2 to RB: https://reviews.apache.org/r/7305/ It addressed the deadlock issue I mentioned above.
          Hide
          Jimmy Xiang added a comment -

          I think another executor pool is helpful, especially, it can be used to assign those failed regions in parallel too.
          However, for now, probably I will just create another zookeeper watcher instead.

          Show
          Jimmy Xiang added a comment - I think another executor pool is helpful, especially, it can be used to assign those failed regions in parallel too. However, for now, probably I will just create another zookeeper watcher instead.
          Hide
          Jimmy Xiang added a comment -

          Cool, thanks.

          There is one problem with the patch I am still thinking about.

          In the bulk assignment, I keep the async ZK node offline for the performance reason. However, it depends on the zk event thread's callback to know if all nodes are created or not. If the single event thread is blocked due to any locker which is held by the bulk assigner, there will be a deadlock.

          What should we do about this?

          Instead of async ZK node offline, I am thinking to have an executor service to sync ZK node offline so that we don't have too much performance degrade.

          Show
          Jimmy Xiang added a comment - Cool, thanks. There is one problem with the patch I am still thinking about. In the bulk assignment, I keep the async ZK node offline for the performance reason. However, it depends on the zk event thread's callback to know if all nodes are created or not. If the single event thread is blocked due to any locker which is held by the bulk assigner, there will be a deadlock. What should we do about this? Instead of async ZK node offline, I am thinking to have an executor service to sync ZK node offline so that we don't have too much performance degrade.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Will review this tomorrow or over the weekend. Nice work Jimmy.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Will review this tomorrow or over the weekend. Nice work Jimmy.
          Hide
          Jimmy Xiang added a comment -

          Posted a patch on RB: https://reviews.apache.org/r/7305/. Please review.

          I did some performance testing and found the async zookeeper node offline is big performance +, so it is kept. Without this patch,
          it took around 290 seconds to bulk assign 10,339 regions to 4 region servers. With this patch, it took around 300 seconds.
          However, without async zookeeper node offline, it took around 400 seconds.

          As to force close regions, it is not touched and still working as expected.

          Show
          Jimmy Xiang added a comment - Posted a patch on RB: https://reviews.apache.org/r/7305/ . Please review. I did some performance testing and found the async zookeeper node offline is big performance +, so it is kept. Without this patch, it took around 290 seconds to bulk assign 10,339 regions to 4 region servers. With this patch, it took around 300 seconds. However, without async zookeeper node offline, it took around 400 seconds. As to force close regions, it is not touched and still working as expected.
          Hide
          Jimmy Xiang added a comment -

          Sure, I will do that to make sure existing function is not broken, and there is no substantial performance drop.

          Another thing I'd like to address in this jira is that bulk assigning currently doesn't pass the offlined ZK node version to region server as regular assignment does. I think it is needed to avoid competing assigning the same region at the same time.

          Show
          Jimmy Xiang added a comment - Sure, I will do that to make sure existing function is not broken, and there is no substantial performance drop. Another thing I'd like to address in this jira is that bulk assigning currently doesn't pass the offlined ZK node version to region server as regular assignment does. I think it is needed to avoid competing assigning the same region at the same time.
          Hide
          Jacques added a comment -

          Reminders from the PowWow yesterday...

          JD requested that you verify that force close continues to function despite changes.

          JD & Andrew both requested that you run some performance tests to ensure that region assignment doesn't take substantially longer than 0.94. Something along the lines of bulk assignment of 10,000 regions and also checking to ensure that region failover isn't substantially longer.

          Show
          Jacques added a comment - Reminders from the PowWow yesterday... JD requested that you verify that force close continues to function despite changes. JD & Andrew both requested that you run some performance tests to ensure that region assignment doesn't take substantially longer than 0.94. Something along the lines of bulk assignment of 10,000 regions and also checking to ensure that region failover isn't substantially longer.

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development