HBase
  1. HBase
  2. HBASE-4298

Support to drain RS nodes through ZK

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.90.4
    • Fix Version/s: 0.92.0
    • Component/s: master
    • Labels:
    • Environment:

      all

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Just as HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks, this feature adds marking regionservers so they will not get new regions if you add a regionserver to the draining nodes directory in zk. These draining znodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. This patch adds watching of /draining and the blocking of region assignment to draining nodes; it does not provide means of writing the draining znode (use zkcli).
      Show
      Just as HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks, this feature adds marking regionservers so they will not get new regions if you add a regionserver to the draining nodes directory in zk. These draining znodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. This patch adds watching of /draining and the blocking of region assignment to draining nodes; it does not provide means of writing the draining znode (use zkcli).

      Description

      HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks. HDFS goes one step further and even drains these nodes for you. This enhancement is a step in that direction.

      The idea is that we mark nodes in zookeeper as draining nodes. This means that they don't get any more new regions. These draining nodes look exactly the same as the corresponding nodes in /rs, except they live under /draining.

      Eventually, support for draining them can be added. I am submitting two patches for review - one for the 0.90 branch and one for trunk (in git).

      Here are the two patches
      0.90 - https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2

      trunk - https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5

      I have tested both these patches and they work as advertised.

      1. trunk_with_test.txt
        22 kB
        Aravind Gottipati
      2. 4298-trunk-v3.txt
        22 kB
        stack
      3. drainingservertest-v2.txt
        5 kB
        stack
      4. drainingservertest.txt
        5 kB
        stack
      5. 4298-trunk-v2.txt
        14 kB
        stack
      6. trunk_hbase.patch
        14 kB
        Aravind Gottipati
      7. 90_hbase.patch
        13 kB
        Aravind Gottipati

        Issue Links

          Activity

          Aravind Gottipati created issue -
          Todd Lipcon made changes -
          Field Original Value New Value
          Link This issue relates to HBASE-3833 [ HBASE-3833 ]
          stack made changes -
          Priority Minor [ 4 ] Critical [ 2 ]
          stack made changes -
          Fix Version/s 0.92.0 [ 12314223 ]
          Fix Version/s 0.90.5 [ 12317145 ]
          Fix Version/s 0.90.4 [ 12316406 ]
          stack made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ted Yu added a comment -

          I reviewed the patch for trunk.

          For AssignmentManager.java:

          +        ", exclude=" + drainingServers + ") available servers");
          

          I think we only need to log the number of draining servers.

          For ServerManager.java:

          +  /** Map of region servers that should not get any more new regions */
          +  private final Map<ServerName, HServerLoad> drainingServers =
          +    new ConcurrentHashMap<ServerName, HServerLoad>();
          

          The javadoc should state that keys of the map are region servers.

          I think removeServerFromDrainList() should return a boolean. ServerManager.isServerOnline(sn) should be used instead of checking HServerLoad. If sn isn't online, the method should return false. Otherwise true is returned.

          You may consider doing similar action in addServerToDrainList().

          I wonder if Map is needed for drainingServers because it is private and getDrainingServersList() only returns the keySet.

          For DrainingServerTracker.java, please remove year.
          The handling of calling this.serverManager methods is different between add() and remove(): one inside synchronized block, one outside. Is there a reason ?

          More to follow.

          Show
          Ted Yu added a comment - I reviewed the patch for trunk. For AssignmentManager.java: + ", exclude=" + drainingServers + ") available servers" ); I think we only need to log the number of draining servers. For ServerManager.java: + /** Map of region servers that should not get any more new regions */ + private final Map<ServerName, HServerLoad> drainingServers = + new ConcurrentHashMap<ServerName, HServerLoad>(); The javadoc should state that keys of the map are region servers. I think removeServerFromDrainList() should return a boolean. ServerManager.isServerOnline(sn) should be used instead of checking HServerLoad. If sn isn't online, the method should return false. Otherwise true is returned. You may consider doing similar action in addServerToDrainList(). I wonder if Map is needed for drainingServers because it is private and getDrainingServersList() only returns the keySet. For DrainingServerTracker.java, please remove year. The handling of calling this.serverManager methods is different between add() and remove(): one inside synchronized block, one outside. Is there a reason ? More to follow.
          Hide
          Ted Yu added a comment -

          For nodeChildrenChanged(), please change the sentence for catch black of IOException, it mentioned zk exception.

          For ZooKeeperWatcher.java:

          +        conf.get("zookeeper.znode.draining", "draining"));
          

          I think a better name maybe "zookeeper.znode.draining.rs"

          Can you write some unit tests for this feature ?
          Please also share your experience from using this in your environment.

          Show
          Ted Yu added a comment - For nodeChildrenChanged(), please change the sentence for catch black of IOException, it mentioned zk exception. For ZooKeeperWatcher.java: + conf.get( "zookeeper.znode.draining" , "draining" )); I think a better name maybe "zookeeper.znode.draining.rs" Can you write some unit tests for this feature ? Please also share your experience from using this in your environment.
          Hide
          stack added a comment -

          @Aravind Should remove the createNode rather than just comment it out. Nice feature. Do nodes get poked into /draining by external process? Lets work on a unit test for this stuff (and address Ted's comment above). When you get a chance, stick in some of your experience running this patch here.

          Show
          stack added a comment - @Aravind Should remove the createNode rather than just comment it out. Nice feature. Do nodes get poked into /draining by external process? Lets work on a unit test for this stuff (and address Ted's comment above). When you get a chance, stick in some of your experience running this patch here.
          Hide
          Aravind Gottipati added a comment -

          @Ted: Thank you for the review. I made some changes and updated my patch (in github). Notes in line.

          • I think we only need to log the number of draining servers.
          • The javadoc should state that keys of the map are region servers.
          • For DrainingServerTracker.java, please remove year.
          • For nodeChildrenChanged(), please change the sentence for catch black of IOException, it mentioned zk exception.
          • Should remove the createNode rather than just comment it out.
          • serverManager methods is different between add() and remove(): one inside synchronized block, one outside.
          • I think a better name maybe "zookeeper.znode.draining.rs"
          • I agree with all of these and they are all fixed in my latest code push (on github).
          • I wonder if Map is needed for drainingServers because it is private and getDrainingServersList() only returns the keySet.
          • The map isn't required, but I followed the example of onlineServers and serverConnections. For code in trunk, I have changed it to a ArrayList. A similar change does not work (easily) in the 0.90 branch. Code in AssignmentManager uses HServerInfo in 0.90, and changing drainingServers to an array list will mean key lookups etc. I have left it as a Map in 0.90, but I changed it to a list in trunk.
          • removeServerFromDrainList / addServerToDrainList should return a boolean.
          • The remove and add methods are called from DrainingServerTracker. The context is a ZK callback, and the corresponding remove and add functions there simply return voids. I changed the code to return booleans in trunk, but left it as void in the 0.90 branch. I figured they might actually be used in trunk, but I doubt they will be in 0.90.
          • Unit tests..
          • I will work with Stack and get the tests to you.
          • Share your experience from using this in your environment.
          • To reboot the cluster, we currently drain one server at a time (using the graceful stop shell script). This process takes forever to go through all the servers. The goal here is to enable us to drain multiple servers simultaneously. Doing this by keeping track of servers externally makes the programming painful, and we'd have to share state somehow between different scripts that all aim to drain different servers. Leaving this list in ZK and having HBase keep them from getting new regions seems like the right way to go about it. I have tested this in a test cluster of about 14 servers. This code by itself only solves one part of our problem. The rest of it will be solved by command line scripts that will create nodes to be shut down under /draining/rs in ZK, and then move regions out from them.

          Please let me know if you have any other questions about this stuff.

          Show
          Aravind Gottipati added a comment - @Ted: Thank you for the review. I made some changes and updated my patch (in github). Notes in line. I think we only need to log the number of draining servers. The javadoc should state that keys of the map are region servers. For DrainingServerTracker.java, please remove year. For nodeChildrenChanged(), please change the sentence for catch black of IOException, it mentioned zk exception. Should remove the createNode rather than just comment it out. serverManager methods is different between add() and remove(): one inside synchronized block, one outside. I think a better name maybe "zookeeper.znode.draining.rs" I agree with all of these and they are all fixed in my latest code push (on github). I wonder if Map is needed for drainingServers because it is private and getDrainingServersList() only returns the keySet. The map isn't required, but I followed the example of onlineServers and serverConnections. For code in trunk, I have changed it to a ArrayList. A similar change does not work (easily) in the 0.90 branch. Code in AssignmentManager uses HServerInfo in 0.90, and changing drainingServers to an array list will mean key lookups etc. I have left it as a Map in 0.90, but I changed it to a list in trunk. removeServerFromDrainList / addServerToDrainList should return a boolean. The remove and add methods are called from DrainingServerTracker. The context is a ZK callback, and the corresponding remove and add functions there simply return voids. I changed the code to return booleans in trunk, but left it as void in the 0.90 branch. I figured they might actually be used in trunk, but I doubt they will be in 0.90. Unit tests.. I will work with Stack and get the tests to you. Share your experience from using this in your environment. To reboot the cluster, we currently drain one server at a time (using the graceful stop shell script). This process takes forever to go through all the servers. The goal here is to enable us to drain multiple servers simultaneously. Doing this by keeping track of servers externally makes the programming painful, and we'd have to share state somehow between different scripts that all aim to drain different servers. Leaving this list in ZK and having HBase keep them from getting new regions seems like the right way to go about it. I have tested this in a test cluster of about 14 servers. This code by itself only solves one part of our problem. The rest of it will be solved by command line scripts that will create nodes to be shut down under /draining/rs in ZK, and then move regions out from them. Please let me know if you have any other questions about this stuff.
          Hide
          Aravind Gottipati added a comment -

          Well.. JIRA lost all my formatting in my last comment.. I hope it still makes sense.

          The latest changesets are

          https://github.com/aravind/hbase/commit/46f3b58c60f4f1c81806fdad6e606badf84fc30c for trunk.

          https://github.com/aravind/hbase/commit/e6cf9ecf78f8e0d6f46c2a77a524e6bccec45001 for 0.90.

          Show
          Aravind Gottipati added a comment - Well.. JIRA lost all my formatting in my last comment.. I hope it still makes sense. The latest changesets are https://github.com/aravind/hbase/commit/46f3b58c60f4f1c81806fdad6e606badf84fc30c for trunk. https://github.com/aravind/hbase/commit/e6cf9ecf78f8e0d6f46c2a77a524e6bccec45001 for 0.90.
          Hide
          Ted Yu added a comment -

          @Aravind:
          Reading plain text version in my mailbox isn't hard at all.

          Thanks for taking care of my review comments. Appreciate it.

          Can you attach the two patches to this JIRA or publish them on reviewboard ?
          That way you can get more helpful comments and I can run test suite over them.

          Good job.

          Show
          Ted Yu added a comment - @Aravind: Reading plain text version in my mailbox isn't hard at all. Thanks for taking care of my review comments. Appreciate it. Can you attach the two patches to this JIRA or publish them on reviewboard ? That way you can get more helpful comments and I can run test suite over them. Good job.
          Hide
          Aravind Gottipati added a comment -

          Patch files for trunk and 0.90.

          Show
          Aravind Gottipati added a comment - Patch files for trunk and 0.90.
          Aravind Gottipati made changes -
          Attachment 90_hbase.patch [ 12496595 ]
          Attachment trunk_hbase.patch [ 12496596 ]
          Hide
          Ted Yu added a comment -

          @Aravind:
          Is it possible to come up with some unit test for this feature ?

          Thanks

          Show
          Ted Yu added a comment - @Aravind: Is it possible to come up with some unit test for this feature ? Thanks
          Hide
          Ted Yu added a comment -
          Show
          Ted Yu added a comment - https://reviews.apache.org/r/2063/ for trunk and https://reviews.apache.org/r/2064/ for 0.90
          Hide
          stack added a comment -

          Here are the minor items addressed on aravind's patch

          Show
          stack added a comment - Here are the minor items addressed on aravind's patch
          stack made changes -
          Attachment 4298-trunk-v2.txt [ 12501479 ]
          Hide
          stack added a comment -

          There's comments over in https://reviews.apache.org/r/2063/. In particular there is this one:

          "What about balancing? I see no consideration of draining servers in the balance algorithm. I suppose you have it disabled when this is all running Aravind, is that so?"

          Show
          stack added a comment - There's comments over in https://reviews.apache.org/r/2063/ . In particular there is this one: "What about balancing? I see no consideration of draining servers in the balance algorithm. I suppose you have it disabled when this is all running Aravind, is that so?"
          stack made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          stack made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501479/4298-trunk-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -166 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/102//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/102//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/102//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501479/4298-trunk-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/102//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/102//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/102//console This message is automatically generated.
          Hide
          stack added a comment -

          The TDLS failed because of:

          2011-10-30 00:16:16,142 ERROR [org.apache.hadoop.hdfs.server.datanode.DataXceiver@15bc53] datanode.DataXceiver(131): DatanodeRegistration(127.0.0.1:50070, storageID=DS-1489265021-67.195.138.20-50070-1319933763131, infoPort=37761, ipcPort=47077):DataXceiver
          java.net.SocketException: Too many open files
          
          Show
          stack added a comment - The TDLS failed because of: 2011-10-30 00:16:16,142 ERROR [org.apache.hadoop.hdfs.server.datanode.DataXceiver@15bc53] datanode.DataXceiver(131): DatanodeRegistration(127.0.0.1:50070, storageID=DS-1489265021-67.195.138.20-50070-1319933763131, infoPort=37761, ipcPort=47077):DataXceiver java.net.SocketException: Too many open files
          Hide
          Ming Ma added a comment -

          Just in case, there was another code review at https://reviews.apache.org/r/2064. It seems some of the questions have been answered by Aravind.

          Show
          Ming Ma added a comment - Just in case, there was another code review at https://reviews.apache.org/r/2064 . It seems some of the questions have been answered by Aravind.
          Hide
          stack added a comment -

          Thanks Ming. Let me chase Aravind today.

          Show
          stack added a comment - Thanks Ming. Let me chase Aravind today.
          Hide
          Aravind Gottipati added a comment -

          I haven't had the time to figure out how to write unit tests for this, but here are the test cases I ran through.

          Setup a cluster with 5 or more region servers and a bunch of regions.

          1. Mark a server (server A) as draining (using zkCli) and then shutdown a different server (server B) - check to make sure that server A does not get any more new regions.
          2. Mark a server (server A) as draining (using zkCli) and then go through all the regions on that server and assign them (from the cli), they should all end up on different servers and by the end, server A should not have any regions left on it.
          3. Mark a server (server A) as draining (using zkCli), disable the balancer and drain server A (as in step 2). Then enable the balancer again and verify that server A does not get any new regions.
          4. Remove the server (server A) from the draining list, and check that it gets regions when the balancer next runs.

          I ran through other scenarios/tests like this, but with multipls servers in the draining list. We could probably repeat the above tests, but with two servers in the drain list..

          Show
          Aravind Gottipati added a comment - I haven't had the time to figure out how to write unit tests for this, but here are the test cases I ran through. Setup a cluster with 5 or more region servers and a bunch of regions. 1. Mark a server (server A) as draining (using zkCli) and then shutdown a different server (server B) - check to make sure that server A does not get any more new regions. 2. Mark a server (server A) as draining (using zkCli) and then go through all the regions on that server and assign them (from the cli), they should all end up on different servers and by the end, server A should not have any regions left on it. 3. Mark a server (server A) as draining (using zkCli), disable the balancer and drain server A (as in step 2). Then enable the balancer again and verify that server A does not get any new regions. 4. Remove the server (server A) from the draining list, and check that it gets regions when the balancer next runs. I ran through other scenarios/tests like this, but with multipls servers in the draining list. We could probably repeat the above tests, but with two servers in the drain list..
          Hide
          Jonathan Gray added a comment -

          I think this should be for 0.94 since it's a new feature. I also think a pre-requisite to commit is a unit test.

          Show
          Jonathan Gray added a comment - I think this should be for 0.94 since it's a new feature. I also think a pre-requisite to commit is a unit test.
          Hide
          stack added a comment -

          Sketch of a test; need to verify its actually properly testing. Will make combined patch later today.

          Show
          stack added a comment - Sketch of a test; need to verify its actually properly testing. Will make combined patch later today.
          stack made changes -
          Attachment drainingservertest.txt [ 12502060 ]
          Hide
          stack added a comment -

          Almost done. Let me write another test.

          Show
          stack added a comment - Almost done. Let me write another test.
          stack made changes -
          Attachment drainingservertest-v2.txt [ 12502063 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12502063/drainingservertest-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -164 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/147//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/147//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/147//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502063/drainingservertest-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -164 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/147//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/147//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          There was compilation error:

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure:
          [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java:[78,4] not a statement
          [ERROR] 
          [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java:[78,5] ';' expected
          [ERROR] -> [Help 1]
          
          Show
          Ted Yu added a comment - There was compilation error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile ( default -testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java:[78,4] not a statement [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java:[78,5] ';' expected [ERROR] -> [Help 1]
          Hide
          stack added a comment -

          Eh.. yeah. I know what I attached.

          Interesting though how patch-build ran this anyways though I did not cancel the old patch only submitted a new. Thats ok. Just something unexpected.

          Show
          stack added a comment - Eh.. yeah. I know what I attached. Interesting though how patch-build ran this anyways though I did not cancel the old patch only submitted a new. Thats ok. Just something unexpected.
          Hide
          stack added a comment -

          Combined patch – Aravinds' plus test. Marked as not intended for inclusion because Aravind needs to do the granting.

          Show
          stack added a comment - Combined patch – Aravinds' plus test. Marked as not intended for inclusion because Aravind needs to do the granting.
          stack made changes -
          Attachment 4298-trunk-v3.txt [ 12502103 ]
          stack made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          stack made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12502103/4298-trunk-v3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -164 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.TestGlobalMemStoreSize
          org.apache.hadoop.hbase.master.TestMasterFailover

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/152//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/152//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/152//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502103/4298-trunk-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -164 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.TestGlobalMemStoreSize org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/152//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/152//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/152//console This message is automatically generated.
          Hide
          stack added a comment -

          I ran TestLogRolling locally and can't get it to fail. It failed with 'Cannot lock storage /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/7f164eb7-a297-40fb-89ff-6bd9a0e331db/dfscluster_e7b1f239-5bca-46c6-abfb-f8237c09a7ce/dfs/data/data3. The directory is already locked.'

          TestDistributedLogSplitting fails with too many open files but ulimit for files says 32k?

          asf011.sp2.ygridcore.net
          core file size          (blocks, -c) 0
          data seg size           (kbytes, -d) unlimited
          scheduling priority             (-e) 20
          file size               (blocks, -f) unlimited
          pending signals                 (-i) 16382
          max locked memory       (kbytes, -l) 64
          max memory size         (kbytes, -m) unlimited
          open files                      (-n) 32768
          pipe size            (512 bytes, -p) 8
          POSIX message queues     (bytes, -q) 819200
          real-time priority              (-r) 0
          stack size              (kbytes, -s) 8192
          cpu time               (seconds, -t) unlimited
          max user processes              (-u) 2048
          virtual memory          (kbytes, -v) unlimited
          file locks                      (-x) unlimited
          32768
          Running in Jenkins mode
          

          This is good to go I'd say.

          Show
          stack added a comment - I ran TestLogRolling locally and can't get it to fail. It failed with 'Cannot lock storage /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/7f164eb7-a297-40fb-89ff-6bd9a0e331db/dfscluster_e7b1f239-5bca-46c6-abfb-f8237c09a7ce/dfs/data/data3. The directory is already locked.' TestDistributedLogSplitting fails with too many open files but ulimit for files says 32k? asf011.sp2.ygridcore.net core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 32768 Running in Jenkins mode This is good to go I'd say.
          Hide
          Aravind Gottipati added a comment -

          Patch against trunk with Stack's test case.

          Show
          Aravind Gottipati added a comment - Patch against trunk with Stack's test case.
          Aravind Gottipati made changes -
          Attachment trunk_with_test.txt [ 12502184 ]
          Hide
          stack added a comment -

          Committed to branch and trunk. Thanks for the patch Aravind.

          This is a new feature that has been running for months now at my place of employ. Our ops like it (and dev'd it).

          Show
          stack added a comment - Committed to branch and trunk. Thanks for the patch Aravind. This is a new feature that has been running for months now at my place of employ. Our ops like it (and dev'd it).
          stack made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          stack added a comment -

          Reopening. It was applied to 0.92 but we might want to apply to 0.90... moving it over there.

          Show
          stack added a comment - Reopening. It was applied to 0.92 but we might want to apply to 0.90... moving it over there.
          stack made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          stack made changes -
          Release Note Just as HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks, this feature adds marking regionservers so they will not get new regions if you add a regionserver to the draining nodes directory in zk. These draining znodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. This patch adds watching of /draining and the blocking of region assignment to draining nodes; it does not provide means of writing the draining znode (use zkcli).
          Fix Version/s 0.92.0 [ 12314223 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12502184/trunk_with_test.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -164 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/160//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/160//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/160//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502184/trunk_with_test.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -164 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 46 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/160//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/160//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/160//console This message is automatically generated.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2407 (See https://builds.apache.org/job/HBase-TRUNK/2407/)
          HBASE-4298 Support to drain RS nodes through ZK

          stack :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/DrainingServerTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2407 (See https://builds.apache.org/job/HBase-TRUNK/2407/ ) HBASE-4298 Support to drain RS nodes through ZK stack : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/DrainingServerTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #105 (See https://builds.apache.org/job/HBase-0.92/105/)
          HBASE-4298 Support to drain RS nodes through ZK

          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/DrainingServerTracker.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #105 (See https://builds.apache.org/job/HBase-0.92/105/ ) HBASE-4298 Support to drain RS nodes through ZK stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/DrainingServerTracker.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          stack made changes -
          Fix Version/s 0.90.6 [ 12319200 ]
          Fix Version/s 0.90.5 [ 12317145 ]
          Hide
          ramkrishna.s.vasudevan added a comment -

          Moving into 0.90.7

          Show
          ramkrishna.s.vasudevan added a comment - Moving into 0.90.7
          ramkrishna.s.vasudevan made changes -
          Fix Version/s 0.90.7 [ 12319481 ]
          Fix Version/s 0.90.6 [ 12319200 ]
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Stack
          This issue has gone into 0.92 and trunk. As it is a new feature do you want to go into future 0.90 releases? if not can remove the fix versions as 0.90?

          Show
          ramkrishna.s.vasudevan added a comment - @Stack This issue has gone into 0.92 and trunk. As it is a new feature do you want to go into future 0.90 releases? if not can remove the fix versions as 0.90?
          ramkrishna.s.vasudevan made changes -
          Fix Version/s 0.92.0 [ 12314223 ]
          Hide
          stack added a comment -

          Removed 0.90.7 as a fix version.

          Show
          stack added a comment - Removed 0.90.7 as a fix version.
          stack made changes -
          Fix Version/s 0.90.7 [ 12319481 ]
          Hide
          Dave Latham added a comment -

          Any reason this issue is still open?

          Show
          Dave Latham added a comment - Any reason this issue is still open?
          Hide
          stack added a comment -

          Committed to trunk and 0.92 long time ago. Resolving. Thanks for patch Aravind.

          Show
          stack added a comment - Committed to trunk and 0.92 long time ago. Resolving. Thanks for patch Aravind.
          stack made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Jonathan Hsieh made changes -
          Link This issue blocks HBASE-6344 [ HBASE-6344 ]
          Gavin made changes -
          Link This issue blocks HBASE-6344 [ HBASE-6344 ]
          Gavin made changes -
          Link This issue is depended upon by HBASE-6344 [ HBASE-6344 ]
          Hide
          Jonathan Hsieh added a comment -

          Adding a link to the blog entry by aravind that explains how to use this: http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html

          Show
          Jonathan Hsieh added a comment - Adding a link to the blog entry by aravind that explains how to use this: http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html

            People

            • Assignee:
              Unassigned
              Reporter:
              Aravind Gottipati
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development