diff --git src/docbkx/developer.xml src/docbkx/developer.xml
index 1b2852e..0a945e8 100644
--- src/docbkx/developer.xml
+++ src/docbkx/developer.xml
@@ -310,11 +310,11 @@ given the dependency tree).
Unit Tests
-HBase unit tests are subdivided into three categories: small, medium and large, with
+HBase unit tests are subdivided into four categories: small, medium, large, and integration with
corresponding JUnit categories:
SmallTests, MediumTests,
-LargeTests. JUnit categories are denoted using java annotations
-and look like this in your unit test code.
+LargeTests, IntegrationTests.
+JUnit categories are denoted using java annotations and look like this in your unit test code.
...
@Category(SmallTests.class)
public class TestHRegionInfo {
@@ -352,13 +352,20 @@ individually. They can use a cluster, and each of them is executed in a separate
LargeTests
-Large tests are everything else. They are typically integration-like
+Large tests are everything else. They are typically large-scale
tests, regression tests for specific bugs, timeout tests, performance tests.
They are executed before a commit on the pre-integration machines. They can be run on
the developer machine as well.
+
+IntegrationTests
+Integration tests are system level tests. See
+integration tests section for more info.
+
+
+
Running testsBelow we describe how to run the HBase junit categories.
@@ -486,6 +493,165 @@ As most as possible, tests should use the default settings for the cluster. When
+
+
+Integration Tests
+HBase integration/system tests are tests that are beyond HBase unit tests. They
+are generally long-lasting, sizeable (the test can be asked to 1M rows or 1B rows),
+targetable (they can take configuration that will point them at the ready-made cluster
+they are to run against; integration tests do not include cluster start/stop code),
+and verifying success, integration tests rely on public APIs only; they do not
+attempt to examine server internals asserting success/fail. Integration tests
+are what you would run when you need to more elaborate proofing of a release candidate
+beyond what unit tests can do. They are not generally run on the Apache Continuous Integration
+build server, however, some sites opt to run integration tests as a part of their
+continuous testing on an actual cluster.
+
+
+Integration tests currently live under the src/test directory
+in the hbase-it submodule and will match the regex: **/IntegrationTest*.java.
+All integration tests are also annotated with @Category(IntegrationTests.class).
+
+
+
+Integration tests can be run in two modes: using a mini cluster, or against an actual distributed cluster.
+Maven failsafe is used to run the tests using the mini cluster. IntegrationTestsDriver class is used for
+executing the tests against a distributed cluster. Integration tests SHOULD NOT assume that they are running against a
+mini cluster, and SHOULD NOT use private API's to access cluster state. To interact with the distributed or mini
+cluster uniformly, HBaseIntegrationTestingUtility, and HBaseCluster classes,
+and public client API's can be used.
+
+
+
+Running integration tests against mini cluster
+HBase 0.92 added a verify maven target.
+Invoking it, for example by doing mvn verify, will
+run all the phases up to and including the verify phase via the
+maven failsafe plugin,
+running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group.
+After you have completed
+ mvn install -DskipTests
+You can run just the integration tests by invoking:
+
+cd hbase-it
+mvn verify
+
+If you just want to run the integration tests in top-level, you need to run two commands. First:
+ mvn failsafe:integration-test
+This actually runs ALL the integration tests.
+ This command will always output BUILD SUCCESS even if there are test failures.
+
+ At this point, you could grep the output by hand looking for failed tests. However, maven will do this for us; just use:
+ mvn failsafe:verify
+ The above command basically looks at all the test results (so don't remove the 'target' directory) for test failures and reports the results.
+
+
+ Running a subset of Integration tests
+ This is very similar to how you specify running a subset of unit tests (see above), but use the property
+ it.test instead of test.
+To just run IntegrationTestClassXYZ.java, use:
+ mvn failsafe:integration-test -Dit.test=IntegrationTestClassXYZ
+ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java:
+ mvn failsafe:integration-test -Dit.test=*ClassX*
+ This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*".
+ You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups.This would look something like:
+ mvn failsafe:integration-test -Dit.test=*ClassX*, *ClassY
+
+
+
+
+Running integration tests against distributed cluster
+
+If you have an already-setup HBase cluster, you can launch the integration tests by invoking the class IntegrationTestsDriver. You may have to
+run test-compile first. The configuration will be picked by the bin/hbase script.
+mvn test-compile
+Then launch the tests with:
+bin/hbase [--config config_dir] org.apache.hadoop.hbase.IntegrationTestsDriver
+
+This execution will launch the tests under hbase-it/src/test, having @Category(IntegrationTests.class) annotation,
+and a name starting with IntegrationTests. It uses Junit to run the tests. Currently there is no support for running integration tests against a distributed cluster using maven (see HBASE-6201).
+
+
+
+The tests interact with the distributed cluster by using the methods in the DistributedHBaseCluster (implementing HBaseCluster) class, which in turn uses a pluggable ClusterManager. Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default ClusterManager is HBaseClusterManager, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough "power" to start/stop servers on the remote machines. By default, it picks up HBASE_SSH_OPTS, HBASE_HOME, HBASE_CONF_DIR from the env, and uses bin/hbase-daemon.sh to carry out the actions. Currently tarball deployments, deployments which uses hbase-daemons.sh, and Apache Ambari deployments are supported. /etc/init.d/ scripts are not supported for now, but it can be easily added. For other deployment options, a ClusterManager can be implemented and plugged in.
+
+
+
+
+Destructive integration / system tests
+
+ In 0.96, a tool named ChaosMonkey has been introduced. It is modeled after the same-named tool by Netflix.
+Some of the tests use ChaosMonkey to simulate faults in the running cluster in the way of killing random servers,
+disconnecting servers, etc. ChaosMonkey can also be used as a stand-alone tool to run a (misbehaving) policy while you
+are running other tests.
+
+
+
+ChaosMonkey defines Action's and Policy's. Actions are sequences of events. We have at least the following actions:
+
+Restart active master (sleep 5 sec)
+Restart random regionserver (sleep 5 sec)
+Restart random regionserver (sleep 60 sec)
+Restart META regionserver (sleep 5 sec)
+Restart ROOT regionserver (sleep 5 sec)
+Batch restart of 50% of regionservers (sleep 5 sec)
+Rolling restart of 100% of regionservers (sleep 5 sec)
+
+
+Policies on the other hand are responsible for executing the actions based on a strategy.
+The default policy is to execute a random action every minute based on predefined action
+weights. ChaosMonkey executes predefined named policies until it is stopped. More than one
+policy can be active at any time.
+
+
+
+ To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. ChaosMonkey uses the configuration
+from the bin/hbase script, thus no extra configuration needs to be done. You can invoke the ChaosMonkey by running:
+bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
+
+This will output smt like:
+
+12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
+12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
+12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master
+12/11/19 23:22:24 INFO util.ChaosMonkey: Killing master:master.example.com,60000,1353367210440
+12/11/19 23:22:24 INFO hbase.HBaseCluster: Aborting Master: master.example.com,60000,1353367210440
+12/11/19 23:22:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:master.example.com
+12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
+12/11/19 23:22:25 INFO hbase.HBaseCluster: Waiting service:master to stop: master.example.com,60000,1353367210440
+12/11/19 23:22:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:master.example.com
+12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
+12/11/19 23:22:25 INFO util.ChaosMonkey: Killed master server:master.example.com,60000,1353367210440
+12/11/19 23:22:25 INFO util.ChaosMonkey: Sleeping for:5000
+12/11/19 23:22:30 INFO util.ChaosMonkey: Starting master:master.example.com
+12/11/19 23:22:30 INFO hbase.HBaseCluster: Starting Master on: master.example.com
+12/11/19 23:22:30 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start master , hostname:master.example.com
+12/11/19 23:22:31 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting master, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-master-master.example.com.out
+....
+12/11/19 23:22:33 INFO util.ChaosMonkey: Started master: master.example.com,60000,1353367210440
+12/11/19 23:22:33 INFO util.ChaosMonkey: Sleeping for:51321
+12/11/19 23:23:24 INFO util.ChaosMonkey: Performing action: Restart random region server
+12/11/19 23:23:24 INFO util.ChaosMonkey: Killing region server:rs3.example.com,60020,1353367027826
+12/11/19 23:23:24 INFO hbase.HBaseCluster: Aborting RS: rs3.example.com,60020,1353367027826
+12/11/19 23:23:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:rs3.example.com
+12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
+12/11/19 23:23:25 INFO hbase.HBaseCluster: Waiting service:regionserver to stop: rs3.example.com,60020,1353367027826
+12/11/19 23:23:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:rs3.example.com
+12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
+12/11/19 23:23:25 INFO util.ChaosMonkey: Killed region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
+12/11/19 23:23:25 INFO util.ChaosMonkey: Sleeping for:60000
+12/11/19 23:24:25 INFO util.ChaosMonkey: Starting region server:rs3.example.com
+12/11/19 23:24:25 INFO hbase.HBaseCluster: Starting RS on: rs3.example.com
+12/11/19 23:24:25 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start regionserver , hostname:rs3.example.com
+12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out
+
+12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
+
+
+As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. ChaosMonkey tool, if run from command line, will keep on running until the process is killed.
+
+
+