Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1201992) +++ src/docbkx/book.xml (working copy) @@ -1017,7 +1017,7 @@ job); TableMapReduceUtil.initTableReducerJob( targetTable, // output table - MyReducer.class, // reducer class + MyTableReducer.class, // reducer class job); job.setNumReduceTasks(1); // at least one, adjust as required @@ -1044,7 +1044,7 @@ In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a Put. -public static class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> { +public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int i = 0; @@ -1058,10 +1058,55 @@ } } - +
+ HBase MapReduce Summary to File Example + This very similar to the summary example above, with exception that this is using HBase as a MapReduce source + but HDFS as the sink. The differences are in the job setup and in the reducer. The mapper remains the same. + + +Configuration config = HBaseConfiguration.create(); +Job job = new Job(config,"ExampleSummaryToFile"); +job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer + +Scan scan = new Scan(); +scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs +scan.setCacheBlocks(false); // don't set to true for MR jobs +// set other scan attrs + +TableMapReduceUtil.initTableMapperJob( + sourceTable, // input table + scan, // Scan instance to control CF and attribute selection + MyMapper.class, // mapper class + Text.class, // mapper output key + IntWritable.class, // mapper output value + job); +job.setReducerClass(MyReducer.class); // reducer class +job.setNumReduceTasks(1); // at least one, adjust as required +FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required + +boolean b = job.waitForCompletion(true); +if (!b) { + throw new IOException("error with job!"); +} + + As stated above, the previous Mapper can run unchanged with this example. + As for the Reducer, it is a "generic" Reducer instead of extending TableMapper and emitting Puts. + + public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { + + public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { + int i = 0; + for (IntWritable val : values) { + i += val.get(); + } + context.write(key, new IntWritable(i)); + } +} +
+
Accessing Other HBase Tables in a MapReduce Job Although the framework currently allows one HBase table as input to a Index: src/docbkx/troubleshooting.xml =================================================================== --- src/docbkx/troubleshooting.xml (revision 1201939) +++ src/docbkx/troubleshooting.xml (working copy) @@ -535,6 +535,8 @@ hadoop fs -dus /hbase/ ...returns the summarized disk utilization for all HBase objects. hadoop fs -dus /hbase/myTable ...returns the summarized disk utilization for the HBase table 'myTable'. hadoop fs -du /hbase/myTable ...returns a list of the regions under the HBase table 'myTable' and their disk utilization. + For more information on HDFS shell commands, see the HDFS FileSystem Shell documentation. +
Browsing HDFS for HBase Objects @@ -558,6 +560,9 @@ /<HLog> (WAL HLog files for the RegionServer) + See the HDFS User Guide for other non-shell diagnostic + utilities like fsck. +
Use Cases Two common use-cases for querying HDFS for HBase objects is research the degree of uncompaction of a table. If there are a large number of StoreFiles for each ColumnFamily it could