Index: src/docbkx/ops_mgt.xml
===================================================================
--- src/docbkx/ops_mgt.xml (revision 0)
+++ src/docbkx/ops_mgt.xml (revision 0)
@@ -0,0 +1,237 @@
+
+
+ HBase Operational Management
+ This chapter will cover operational tools and practices required of a running HBase cluster.
+ The subject of operations is related to the topics of , ,
+ and but is a distinct topic in itself.
+
+
+ HBase Tools and Utilities
+
+ Here we list HBase tools for administration, analysis, fixup, and
+ debugging.
+
+ HBase hbck
+ An fsck for your HBase install
+ To run hbck against your HBase cluster run
+ $ ./bin/hbase hbck
+ At the end of the commands output it prints OK
+ or INCONSISTENCY. If your cluster reports
+ inconsistencies, pass -details to see more detail emitted.
+ If inconsistencies, run hbck a few times because the
+ inconsistency may be transient (e.g. cluster is starting up or a region is
+ splitting).
+ Passing -fix may correct the inconsistency (This latter
+ is an experimental feature).
+
+
+ HFile Tool
+ See .
+
+
+ WAL Tools
+
+
+ HLog tool
+
+ The main method on HLog offers manual
+ split and dump facilities. Pass it WALs or the product of a split, the
+ content of the recovered.edits. directory.
+
+ You can get a textual dump of a WAL file content by doing the
+ following:$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012The
+ return code will be non-zero if issues with the file so you can test
+ wholesomeness of file by redirecting STDOUT to
+ /dev/null and testing the program return.
+
+ Similarily you can force a split of a log file directory by
+ doing: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/
+
+
+ Compression Tool
+ See .
+
+
+ CopyTable
+
+ CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
+$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
+
+
+
+ Options:
+
+ rs.class hbase.regionserver.class of the peer cluster. Specify if different from current cluster.
+ rs.impl hbase.regionserver.impl of the peer cluster.
+ starttime Beginning of the time range. Without endtime means starttime to forever.
+ endtime End of the time range. Without endtime means starttime to forever.
+ new.name New table's name.
+ peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
+ families Comma-separated list of ColumnFamilies to copy.
+
+ Args:
+
+ tablename Name of table to copy.
+
+
+ Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:
+$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
+--rs.class=org.apache.hadoop.hbase.ipc.ReplicationRegionInterface
+--rs.impl=org.apache.hadoop.hbase.regionserver.replication.ReplicationRegionServer
+--starttime=1265875194289 --endtime=1265878794289
+--peer.adr=server1,server2,server3:2181:/hbase TestTable
+
+
+
+
+ Node Management
+ Node Decommission
+ You can stop an individual RegionServer by running the following
+ script in the HBase directory on the particular node:
+ $ ./bin/hbase-daemon.sh stop regionserver
+ The RegionServer will first close all regions and then shut itself down.
+ On shutdown, the RegionServer's ephemeral node in ZooKeeper will expire.
+ The master will notice the RegionServer gone and will treat it as
+ a 'crashed' server; it will reassign the nodes the RegionServer was carrying.
+ Disable the Load Balancer before Decommissioning a node
+ If the load balancer runs while a node is shutting down, then
+ there could be contention between the Load Balancer and the
+ Master's recovery of the just decommissioned RegionServer.
+ Avoid any problems by disabling the balancer first.
+ See below.
+
+
+
+
+ A downside to the above stop of a RegionServer is that regions could be offline for
+ a good period of time. Regions are closed in order. If many regions on the server, the
+ first region to close may not be back online until all regions close and after the master
+ notices the RegionServer's znode gone. In HBase 0.90.2, we added facility for having
+ a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the
+ graceful_stop.sh script. Here is its usage:
+ $ ./bin/graceful_stop.sh
+Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname>
+ thrift If we should stop/start thrift before/after the hbase stop/start
+ rest If we should stop/start rest before/after the hbase stop/start
+ restart If we should restart after graceful stop
+ reload Move offloaded regions back on to the stopped server
+ debug Move offloaded regions back on to the stopped server
+ hostname Hostname of server we are to stop
+
+
+ To decommission a loaded RegionServer, run the following:
+ $ ./bin/graceful_stop.sh HOSTNAME
+ where HOSTNAME is the host carrying the RegionServer
+ you would decommission.
+ On HOSTNAME
+ The HOSTNAME passed to graceful_stop.sh
+ must match the hostname that hbase is using to identify RegionServers.
+ Check the list of RegionServers in the master UI for how HBase is
+ referring to servers. Its usually hostname but can also be FQDN.
+ Whatever HBase is using, this is what you should pass the
+ graceful_stop.sh decommission
+ script. If you pass IPs, the script is not yet smart enough to make
+ a hostname (or FQDN) of it and so it will fail when it checks if server is
+ currently running; the graceful unloading of regions will not run.
+
+ The graceful_stop.sh script will move the regions off the
+ decommissioned RegionServer one at a time to minimize region churn.
+ It will verify the region deployed in the new location before it
+ will moves the next region and so on until the decommissioned server
+ is carrying zero regions. At this point, the graceful_stop.sh
+ tells the RegionServer stop. The master will at this point notice the
+ RegionServer gone but all regions will have already been redeployed
+ and because the RegionServer went down cleanly, there will be no
+ WAL logs to split.
+ Load Balancer
+
+ It is assumed that the Region Load Balancer is disabled while the
+ graceful_stop script runs (otherwise the balancer
+ and the decommission script will end up fighting over region deployments).
+ Use the shell to disable the balancer:
+ hbase(main):001:0> balance_switch false
+true
+0 row(s) in 0.3590 seconds
+This turns the balancer OFF. To reenable, do:
+ hbase(main):001:0> balance_switch true
+false
+0 row(s) in 0.3590 seconds
+
+
+
+
+
+ Rolling Restart
+
+ You can also ask this script to restart a RegionServer after the shutdown
+ AND move its old regions back into place. The latter you might do to
+ retain data locality. A primitive rolling restart might be effected by
+ running something like the following:
+ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
+
+ Tail the output of /tmp/log.txt to follow the scripts
+ progress. The above does RegionServers only. Be sure to disable the
+ load balancer before doing the above. You'd need to do the master
+ update separately. Do it before you run the above script.
+ Here is a pseudo-script for how you might craft a rolling restart script:
+
+ Untar your release, make sure of its configuration and
+ then rsync it across the cluster. If this is 0.90.2, patch it
+ with HBASE-3744 and HBASE-3756.
+
+
+
+ Run hbck to ensure the cluster consistent
+ $ ./bin/hbase hbck
+ Effect repairs if inconsistent.
+
+
+
+ Restart the Master: $ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master
+
+
+
+
+ Disable the region balancer:$ echo "balance_switch false" | ./bin/hbase shell
+
+
+
+ Run the graceful_stop.sh script per RegionServer. For example:
+ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
+
+ If you are running thrift or rest servers on the RegionServer, pass --thrift or --rest options (See usage
+ for graceful_stop.sh script).
+
+
+
+ Restart the Master again. This will clear out dead servers list and reenable the balancer.
+
+
+
+ Run hbck to ensure the cluster is consistent.
+
+
+
+
+
+
+
+
+ HBase Monitoring
+ TODO
+
+
+
+ HBase Backup
+ See HBase Backup Options over on the Sematext Blog.
+
+
+
+
Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1162226)
+++ src/docbkx/book.xml (working copy)
@@ -1403,216 +1403,7 @@
-
-
- Tools
-
- Here we list HBase tools for administration, analysis, fixup, and
- debugging.
-
- HBase hbck
- An fsck for your HBase install
- To run hbck against your HBase cluster run
- $ ./bin/hbase hbck
- At the end of the commands output it prints OK
- or INCONSISTENCY. If your cluster reports
- inconsistencies, pass -details to see more detail emitted.
- If inconsistencies, run hbck a few times because the
- inconsistency may be transient (e.g. cluster is starting up or a region is
- splitting).
- Passing -fix may correct the inconsistency (This latter
- is an experimental feature).
-
-
- HFile Tool
- See .
-
-
- WAL Tools
-
-
- HLog tool
-
- The main method on HLog offers manual
- split and dump facilities. Pass it WALs or the product of a split, the
- content of the recovered.edits. directory.
-
- You can get a textual dump of a WAL file content by doing the
- following:$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012The
- return code will be non-zero if issues with the file so you can test
- wholesomeness of file by redirecting STDOUT to
- /dev/null and testing the program return.
-
- Similarily you can force a split of a log file directory by
- doing: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/
-
-
- Compression Tool
- See .
-
- Node Decommission
- You can stop an individual RegionServer by running the following
- script in the HBase directory on the particular node:
- $ ./bin/hbase-daemon.sh stop regionserver
- The RegionServer will first close all regions and then shut itself down.
- On shutdown, the RegionServer's ephemeral node in ZooKeeper will expire.
- The master will notice the RegionServer gone and will treat it as
- a 'crashed' server; it will reassign the nodes the RegionServer was carrying.
- Disable the Load Balancer before Decommissioning a node
- If the load balancer runs while a node is shutting down, then
- there could be contention between the Load Balancer and the
- Master's recovery of the just decommissioned RegionServer.
- Avoid any problems by disabling the balancer first.
- See below.
-
-
-
-
- A downside to the above stop of a RegionServer is that regions could be offline for
- a good period of time. Regions are closed in order. If many regions on the server, the
- first region to close may not be back online until all regions close and after the master
- notices the RegionServer's znode gone. In HBase 0.90.2, we added facility for having
- a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the
- graceful_stop.sh script. Here is its usage:
- $ ./bin/graceful_stop.sh
-Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname>
- thrift If we should stop/start thrift before/after the hbase stop/start
- rest If we should stop/start rest before/after the hbase stop/start
- restart If we should restart after graceful stop
- reload Move offloaded regions back on to the stopped server
- debug Move offloaded regions back on to the stopped server
- hostname Hostname of server we are to stop
-
-
- To decommission a loaded RegionServer, run the following:
- $ ./bin/graceful_stop.sh HOSTNAME
- where HOSTNAME is the host carrying the RegionServer
- you would decommission.
- On HOSTNAME
- The HOSTNAME passed to graceful_stop.sh
- must match the hostname that hbase is using to identify RegionServers.
- Check the list of RegionServers in the master UI for how HBase is
- referring to servers. Its usually hostname but can also be FQDN.
- Whatever HBase is using, this is what you should pass the
- graceful_stop.sh decommission
- script. If you pass IPs, the script is not yet smart enough to make
- a hostname (or FQDN) of it and so it will fail when it checks if server is
- currently running; the graceful unloading of regions will not run.
-
- The graceful_stop.sh script will move the regions off the
- decommissioned RegionServer one at a time to minimize region churn.
- It will verify the region deployed in the new location before it
- will moves the next region and so on until the decommissioned server
- is carrying zero regions. At this point, the graceful_stop.sh
- tells the RegionServer stop. The master will at this point notice the
- RegionServer gone but all regions will have already been redeployed
- and because the RegionServer went down cleanly, there will be no
- WAL logs to split.
- Load Balancer
-
- It is assumed that the Region Load Balancer is disabled while the
- graceful_stop script runs (otherwise the balancer
- and the decommission script will end up fighting over region deployments).
- Use the shell to disable the balancer:
- hbase(main):001:0> balance_switch false
-true
-0 row(s) in 0.3590 seconds
-This turns the balancer OFF. To reenable, do:
- hbase(main):001:0> balance_switch true
-false
-0 row(s) in 0.3590 seconds
-
-
-
-
- Rolling Restart
-
- You can also ask this script to restart a RegionServer after the shutdown
- AND move its old regions back into place. The latter you might do to
- retain data locality. A primitive rolling restart might be effected by
- running something like the following:
- $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
-
- Tail the output of /tmp/log.txt to follow the scripts
- progress. The above does RegionServers only. Be sure to disable the
- load balancer before doing the above. You'd need to do the master
- update separately. Do it before you run the above script.
- Here is a pseudo-script for how you might craft a rolling restart script:
-
- Untar your release, make sure of its configuration and
- then rsync it across the cluster. If this is 0.90.2, patch it
- with HBASE-3744 and HBASE-3756.
-
-
-
- Run hbck to ensure the cluster consistent
- $ ./bin/hbase hbck
- Effect repairs if inconsistent.
-
-
-
- Restart the Master: $ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master
-
-
-
-
- Disable the region balancer:$ echo "balance_switch false" | ./bin/hbase shell
-
-
-
- Run the graceful_stop.sh script per RegionServer. For example:
- $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
-
- If you are running thrift or rest servers on the RegionServer, pass --thrift or --rest options (See usage
- for graceful_stop.sh script).
-
-
-
- Restart the Master again. This will clear out dead servers list and reenable the balancer.
-
-
-
- Run hbck to ensure the cluster is consistent.
-
-
-
-
-
-
-
-
- CopyTable
-
- CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
-$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
-
-
-
- Options:
-
- rs.class hbase.regionserver.class of the peer cluster. Specify if different from current cluster.
- rs.impl hbase.regionserver.impl of the peer cluster.
- starttime Beginning of the time range. Without endtime means starttime to forever.
- endtime End of the time range. Without endtime means starttime to forever.
- new.name New table's name.
- peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
- families Comma-separated list of ColumnFamilies to copy.
-
- Args:
-
- tablename Name of table to copy.
-
-
- Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:
-$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
---rs.class=org.apache.hadoop.hbase.ipc.ReplicationRegionInterface
---rs.impl=org.apache.hadoop.hbase.regionserver.replication.ReplicationRegionServer
---starttime=1265875194289 --endtime=1265878794289
---peer.adr=server1,server2,server3:2181:/hbase TestTable
-
-
-
-
+
@@ -1852,7 +1643,7 @@
- See HBase Backup Options over on the Sematext Blog.
+ See