Index: src/docbkx/performance.xml =================================================================== --- src/docbkx/performance.xml (revision 1206321) +++ src/docbkx/performance.xml (working copy) @@ -140,14 +140,6 @@ The number of regions for an HBase table is driven by the . Also, see the architecture section on - A lower number of regions is preferred, generally in the range of 20 to low-hundreds - per RegionServer. Adjust the regionsize as appropriate to achieve this number. - - For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb. - For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb). - - You may need to experiment with this setting based on your hardware configuration and application needs. -
@@ -161,15 +153,7 @@
<varname>hbase.regionserver.handler.count</varname> See . - This setting in essence sets how many requests are - concurrently being processed inside the RegionServer at any - one time. If set too high, then throughput may suffer as - the concurrent requests contend; if set too low, requests will - be stuck waiting to get into the machine. You can get a - sense of whether you have too little or too many handlers by - - on an individual RegionServer then tailing its logs (Queued requests - consume memory). +
<varname>hfile.block.cache.size</varname> Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1206321) +++ src/docbkx/book.xml (working copy) @@ -1556,41 +1556,30 @@
Regions - This section is all about Regions. - - Regions are comprised of a Store per Column Family. - - + Regions are the basic element of availability and + distribution for tables, and are comprised of a Store per Column Family. +
Region Size - Region size is one of those tricky things, there are a few factors + Determining the "right" region size can be tricky, and there are a few factors to consider: - Regions are the basic element of availability and - distribution. - - - HBase scales by having regions across many servers. Thus if - you have 2 regions for 16GB data, on a 20 node machine you are a net - loss there. + you have 2 regions for 16GB data, on a 20 node machine your data + will be concentrated on just a few machines - nearly the entire + cluster will be idle. This really cant be stressed enough, since a + common problem is loading 200MB data into HBase then wondering why + your awesome 10 node cluster isn't doing anything. - High region count has been known to make things slow, this is - getting better, but it is probably better to have 700 regions than - 3000 for the same amount of data. - - - - Low region count prevents parallel scalability as per point - #2. This really cant be stressed enough, since a common problem is - loading 200MB data into HBase then wondering why your awesome 10 - node cluster is mostly idle. + On the other hand, high region count has been known to make things slow. + This is getting better with each release of HBase, but it is probably better to have + 700 regions than 3000 for the same amount of data. @@ -1599,10 +1588,12 @@ - Its probably best to stick to the default, perhaps going smaller - for hot tables (or manually split hot regions to spread the load over - the cluster), or go with a 1GB region size if your cell sizes tend to be + When starting off, its probably best to stick to the default region-size, perhaps going + smaller for hot tables (or manually split hot regions to spread the load over + the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up). + See for more information on configuration. +
Index: src/docbkx/troubleshooting.xml =================================================================== --- src/docbkx/troubleshooting.xml (revision 1206321) +++ src/docbkx/troubleshooting.xml (working copy) @@ -574,6 +574,18 @@
+
+ Network +
+ Network Spikes + If you are seeing periodic network spikes you might want to check the compactionQueues to see if major + compactions are happening. + + See for more information on managing compactions. + +
+
+
RegionServer For more information on the RegionServers, see . Index: src/docbkx/configuration.xml =================================================================== --- src/docbkx/configuration.xml (revision 1206321) +++ src/docbkx/configuration.xml (working copy) @@ -1028,6 +1028,11 @@ throughput is affected since every request that hits that region server will take longer, which exacerbates the problem even more. + You can get a sense of whether you have too little or too many handlers by + + on an individual RegionServer then tailing its logs (Queued requests + consume memory). +
Configuration for large memory machines @@ -1054,11 +1059,20 @@ Consider going to larger regions to cut down on the total number of regions on your cluster. Generally less Regions to manage makes for a smoother running cluster (You can always later manually split the big Regions should one prove - hot and you want to spread the request load over the cluster). By default, - regions are 256MB in size. You could run with - 1G. Some run with even larger regions; 4G or even larger. Adjust - hbase.hregion.max.filesize in your hbase-site.xml. + hot and you want to spread the request load over the cluster). A lower number of regions is + preferred, generally in the range of 20 to low-hundreds + per RegionServer. Adjust the regionsize as appropriate to achieve this number. + + For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb. + For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb). + + You may need to experiment with this setting based on your hardware configuration and application needs. + + Adjust hbase.hregion.max.filesize in your hbase-site.xml. + RegionSize can also be set on a per-table basis via + HTableDescriptor. +
Managed Splitting