From 0909122937ee4f83c39f60468fbfcb11dd7a4439 Mon Sep 17 00:00:00 2001 From: Nick Dimiduk Date: Mon, 4 Feb 2013 09:51:35 -0800 Subject: [PATCH] HBASE-7758 include description of CellCounter The book includes a description of RowCounter but not the CellCounter job. Remedy the omission. --- src/docbkx/ops_mgt.xml | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml index 1edd5c0..bc83ecf 100644 --- a/src/docbkx/ops_mgt.xml +++ b/src/docbkx/ops_mgt.xml @@ -265,16 +265,35 @@ row10 c1 c2
- RowCounter - RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use - as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. - It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to - exploit. + RowCounter and CellCounter + RowCounter is a + mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read + all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single + process but it will run faster if you have a MapReduce cluster in place for it to exploit. $ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...] - Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + HBase ships another diagnostic mapreduce job called + CellCounter. Like + RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained + and include: + + Total number of rows in the table. + Total number of CFs across all rows. + Total qualifiers across all rows. + Total occurrence of each CF. + Total occurrence of each qualifier. + Total number of versions of each qualifier. + + + The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use + hbase.mapreduce.scan.column.family to specify scanning a single column family. + $ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix] + + Note: just like RowCounter, caching for the input Scan is configured via hbase.client.scanner.caching in the + job configuration.
-- 1.8.1