Index: src/docbkx/ops_mgt.xml
===================================================================
--- src/docbkx/ops_mgt.xml (revision 1178433)
+++ src/docbkx/ops_mgt.xml (working copy)
@@ -89,13 +89,26 @@
--peer.adr=server1,server2,server3:2181:/hbase TestTable
+
+ Export
+ Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:
+$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
+
+
+
+
+ Import
+ Import is a utility that will load data that has been exported back into HBase. Invoke via:
+$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
+
+
+ RowCounterRowCounter is a utility that will count all the rows of a table. This is a good utility to use
as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
-$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter
+$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
-
@@ -240,8 +253,51 @@
HBase Backup
- See HBase Backup Options over on the Sematext Blog.
+ There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.
+ Each approach has pros and cons.
+ For additional information, see HBase Backup Options over on the Sematext Blog.
+
+ Full Shutdown Backup
+ Some environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity
+ and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing
+ any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down. The steps include:
+
+ Stop HBase
+
+
+
+ Backup NameNode
+
+
+
+ Distcp
+ Distcp could be used to either copy the contents of the hbase directory in HDFS to either the same cluster, or do a different cluster.
+
+ Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files.
+ This is not recommended on a live cluster.
+
+
+
+ Live Cluster Backup - Replication
+ This approach assumes that there is a second cluster.
+ See the HBase page on replication for more information.
+
+
+ Live Cluster Backup - CopyTable
+ The utility could either be used to copy data from one table to another on the
+ same cluster, or to copy data to another table on another cluster.
+
+ Since the cluster is up, there is a risk that edits could be missed in the copy process.
+
+
+ Live Cluster Backup - Export
+ The approach dumps the content of a table to HDFS on the same cluster. To restore the data, the
+ utility would be used.
+
+ Since the cluster is up, there is a risk that edits could be missed in the export process.
+
+