From 0ecfdb2a0e146c7e58695afdd940d5c105b8f261 Mon Sep 17 00:00:00 2001 From: Misty Stanley-Jones Date: Tue, 6 Oct 2015 15:17:12 +1000 Subject: [PATCH] HBASE-14558 Documenmt ChaosMonkey enhancements from HBASE-14261 --- src/main/asciidoc/_chapters/developer.adoc | 101 +++++++++++++++++++---------- 1 file changed, 67 insertions(+), 34 deletions(-) diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc index d13ca21..163d47b 100644 --- a/src/main/asciidoc/_chapters/developer.adoc +++ b/src/main/asciidoc/_chapters/developer.adoc @@ -1202,16 +1202,19 @@ _/etc/init.d/_ scripts are not supported for now, but it can be easily added. For other deployment options, a ClusterManager can be implemented and plugged in. [[maven.build.commands.integration.tests.destructive]] -==== Destructive integration / system tests +==== Destructive integration / system tests (ChaosMonkey) -In 0.96, a tool named `ChaosMonkey` has been introduced. -It is modeled after the link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix]. -Some of the tests use ChaosMonkey to simulate faults in the running cluster in the way of killing random servers, disconnecting servers, etc. -ChaosMonkey can also be used as a stand-alone tool to run a (misbehaving) policy while you are running other tests. +HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html +[same-named tool by Netflix's Chaos Monkey tool]. ChaosMonkey simulates real-world +faults in a running cluster by killing or disconnecting random servers, or injecting +other failures into the environment. You can use ChaosMonkey as a stand-alone tool +to run a policy while other tests are running. In some environments, ChaosMonkey is +always running, in order to constantly check that high availability and fault tolerance +are working as expected. -ChaosMonkey defines Action's and Policy's. -Actions are sequences of events. -We have at least the following actions: +ChaosMonkey defines *Actions* and *Policies*. + +Actions:: Actions are predefined sequences of events, such as the following: * Restart active master (sleep 5 sec) * Restart random regionserver (sleep 5 sec) @@ -1221,23 +1224,17 @@ We have at least the following actions: * Batch restart of 50% of regionservers (sleep 5 sec) * Rolling restart of 100% of regionservers (sleep 5 sec) -Policies on the other hand are responsible for executing the actions based on a strategy. -The default policy is to execute a random action every minute based on predefined action weights. -ChaosMonkey executes predefined named policies until it is stopped. -More than one policy can be active at any time. - -To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. -ChaosMonkey uses the configuration from the bin/hbase script, thus no extra configuration needs to be done. -You can invoke the ChaosMonkey by running: +Policies:: A policy is a strategy for executing one or more actions. The default policy +executes a random action every minute based on predefined action weights. +A given policy will be executed until ChaosMonkey is interrupted. -[source,bourne] ----- -bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey ----- - -This will output something like: +Most ChaosMonkey actions are configured to have reasonable defaults, so you can run +ChaosMonkey against an existing cluster without any additional configuration. The +following example runs ChaosMonkey with the default configuration: +[source,bash] ---- +$ bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey 12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000 12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter @@ -1276,31 +1273,38 @@ This will output something like: 12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6 ---- -As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. -ChaosMonkey tool, if run from command line, will keep on running until the process is killed. +The output indicates that ChaosMonkey started the default `PeriodicRandomActionPolicy` +policy, which is configured with all the available actions. It chose to run `RestartActiveMaster` and `RestartRandomRs` actions. + +==== Available Policies +HBase ships with several ChaosMonkey policies, available in the +`hbase/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/policies/` directory. [[chaos.monkey.properties]] -==== Passing individual Chaos Monkey per-test Settings/Properties +==== Configuring Individual ChaosMonkey Actions -Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]), the chaos monkeys is used to run integration tests can be configured per test run. -Users can create a java properties file and and pass this to the chaos monkey with timing configurations. -The properties file needs to be in the HBase classpath. -The various properties that can be configured and their default values can be found listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants` class. -If any chaos monkey configuration is missing from the property file, then the default values are assumed. -For example: +Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348 +[HBASE-11348]), ChaosMonkey integration tests can be configured per test run. +Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using +the `-monkeyProps` configuration flag. Configurable properties, along with their default +values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants` +class. For properties that have defaults, you can override them by including them +in your properties file. + +The following example uses a properties file called <>. [source,bourne] ---- - -$bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties +$ bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties ---- The above command will start the integration tests and chaos monkey passing the properties file _monkey.properties_. Here is an example chaos monkey file: +[[monkey.properties]] +.Example ChaosMonkey Properties File [source] ---- - sdm.action1.period=120000 sdm.action2.period=40000 move.regions.sleep.time=80000 @@ -1309,6 +1313,35 @@ move.regions.sleep.time=80000 batch.restart.rs.ratio=0.4f ---- +HBase 1.0.2 and newer adds the ability to restart HBase's underlying ZooKeeper quorum or +HDFS nodes. To use these actions, you need to configure some new properties, which +have no reasonable defaults because they are deployment-specific, in your ChaosMonkey +properties file, which may be `hbase-site.xml` or a different properties file. + +[source,xml] +---- + + hbase.it.clustermanager.hadoop.home + $HADOOP_HOME + + + hbase.it.clustermanager.zookeeper.home + $ZOOKEEPER_HOME + + + hbase.it.clustermanager.hbase.user + hbase + + + hbase.it.clustermanager.hadoop.hdfs.user + hdfs + + + hbase.it.clustermanager.zookeeper.user + zookeeper + +---- + [[developing]] == Developer Guidelines -- 2.3.8 (Apple Git-58)