Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1214745) +++ src/docbkx/book.xml (working copy) @@ -1142,7 +1142,8 @@ It is critical to understand that number of reducers for the job affects the summarization implementation, and you'll have to design this into your reducer. Specifically, whether it is designed to run as a singleton (one reducer) - or multiple reducers. Neither is right or wrong, it depends on your use-case. + or multiple reducers. Neither is right or wrong, it depends on your use-case. Recognize that the more reducers that + are assigned to the job, the more simultaneous connections to the RDBMS will be created - this will scale, but only to a point. public static class MyRdbmsReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @@ -1164,7 +1165,7 @@ } - In the end, the summary results are in HBase. + In the end, the summary results are written to your RDBMS table/s. @@ -1731,12 +1732,14 @@ The AssignmentManager looks at the existing region assignments in META. - If the region assignment is still valid (i.e., if the RegionServer) is still online + If the region assignment is still valid (i.e., if the RegionServer is still online) then the assignment is kept. If the assignment is invalid, then the LoadBalancerFactory is invoked to assign the - region. The DefaultLoadBalancer will randomly assign the region to a RegionServer and - update META. + region. The DefaultLoadBalancer will randomly assign the region to a RegionServer. + + META is updated with the RegionServer assignment (if needed) and the RegionServer start codes + (start time of the RegionServer process) upon region opening by the RegionServer. @@ -1755,7 +1758,6 @@ -
@@ -1769,9 +1771,8 @@
Region-RegionServer Locality - Over time, Region-RegionServer locality is achieved via the an aspect of - HDFS block replication. The HDFS client when choosing where to write it replicas, - by default does as follows: + Over time, Region-RegionServer locality is achieved via HDFS block replication. + The HDFS client does the following by default when choosing locations to write replicas: First replica is written to local node @@ -1780,9 +1781,9 @@ Third replica is written to a node in another rack (if sufficient nodes) - HBase eventually achieves locality for a region after a flush a compaction. + Thus, HBase eventually achieves locality for a region after a flush or a compaction. In a RegionServer failover situation a RegionServer may be assigned regions with non-local - StoreFiles (i.e., none of the replicas are local), however eventually as new data is written + StoreFiles (because none of the replicas are local), however as new data is written in the region, or the table is compacted and StoreFiles are re-written, they will become "local" to the RegionServer. @@ -2046,6 +2047,16 @@ + Architecture + + How does HBase handle Region-RegionServer assignment and locality? + + + See . + + + + Configuration How can I get started with my first cluster?