Index: src/docbkx/ops_mgt.xml =================================================================== --- src/docbkx/ops_mgt.xml (revision 1207982) +++ src/docbkx/ops_mgt.xml (working copy) @@ -132,6 +132,30 @@ + +
+ Region Management +
+ Major Compaction + Major compactions can be requested via the HBase shell or HBaseAdmin.majorCompact. + + Note: major compactions do NOT do region merges. See for more information about compactions. + + +
+
+ Merge + Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge). +$ bin/hbase org.apache.hbase.util.Merge <tablename> <region1> <region2> + + If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must + run be done when the cluster is down. + See the O'Reilly HBase Book for + an example of usage. + + +
+
Node Management
Node Decommission @@ -340,7 +364,6 @@ See Cluster Replication.
-
HBase Backup There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1207982) +++ src/docbkx/book.xml (working copy) @@ -271,6 +271,12 @@ HTable.delete. + HBase does not modify data in place, and so deletes are handled by creating new markers called tombstones. + These tombstones, along with the dead values, are cleaned up on major compactions. + + See for more information on deleting versions of columns. + +
@@ -428,28 +434,20 @@ -
+
Delete - When performing a delete operation in HBase, there are two - ways to specify the versions to be deleted - - - - Delete all versions older than a certain timestamp + There are three different types of internal delete markers: + + Delete: for a specific version of a column. - - - Delete the version at a specific timestamp + Delete column: for all versions of a column. + + Delete family: for all columns of a particular ColumnFamily - - A delete can apply to a complete row, a complete column - family, or to just one column. It is only in the last case that you - can delete explicit versions. For the deletion of a row or all the - columns within a family, it always works by deleting all cells older - than a certain version. - + When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). + Deletes work by creating tombstone markers. For example, let's suppose we want to delete a row. For this you can specify a version, or else by default the @@ -466,8 +464,10 @@ . If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted. + Also see for more information on the internal KeyValue format. +
-
+
Current Limitations @@ -1113,6 +1113,20 @@ }
+
+ HBase MapReduce Summary Without Reducer + It is also possible to perform summaries without a reducer - if you use HBase as the reducer. + + There would need to exist an HTable target table for the job summary. The HTable method incrementColumnValue + would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map + of values with their values to be incremeneted for each map-task, and make one update per key at during the + cleanup method of the mapper. However, your milage may vary depending on the number of rows to be processed and + unique keys. + + In the end, the summary results are in HBase. + +
+
Accessing Other HBase Tables in a MapReduce Job