Index: src/docbkx/performance.xml
===================================================================
--- src/docbkx/performance.xml (revision 1206321)
+++ src/docbkx/performance.xml (working copy)
@@ -140,14 +140,6 @@
The number of regions for an HBase table is driven by the . Also, see the architecture
section on
- A lower number of regions is preferred, generally in the range of 20 to low-hundreds
- per RegionServer. Adjust the regionsize as appropriate to achieve this number.
-
- For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
- For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
-
- You may need to experiment with this setting based on your hardware configuration and application needs.
-
@@ -161,15 +153,7 @@
hbase.regionserver.handler.countSee .
- This setting in essence sets how many requests are
- concurrently being processed inside the RegionServer at any
- one time. If set too high, then throughput may suffer as
- the concurrent requests contend; if set too low, requests will
- be stuck waiting to get into the machine. You can get a
- sense of whether you have too little or too many handlers by
-
- on an individual RegionServer then tailing its logs (Queued requests
- consume memory).
+
hfile.block.cache.size
Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1206321)
+++ src/docbkx/book.xml (working copy)
@@ -1556,41 +1556,30 @@
Regions
- This section is all about Regions.
-
- Regions are comprised of a Store per Column Family.
-
-
+ Regions are the basic element of availability and
+ distribution for tables, and are comprised of a Store per Column Family.
+ Region Size
- Region size is one of those tricky things, there are a few factors
+ Determining the "right" region size can be tricky, and there are a few factors
to consider:
- Regions are the basic element of availability and
- distribution.
-
-
- HBase scales by having regions across many servers. Thus if
- you have 2 regions for 16GB data, on a 20 node machine you are a net
- loss there.
+ you have 2 regions for 16GB data, on a 20 node machine your data
+ will be concentrated on just a few machines - nearly the entire
+ cluster will be idle. This really cant be stressed enough, since a
+ common problem is loading 200MB data into HBase then wondering why
+ your awesome 10 node cluster isn't doing anything.
- High region count has been known to make things slow, this is
- getting better, but it is probably better to have 700 regions than
- 3000 for the same amount of data.
-
-
-
- Low region count prevents parallel scalability as per point
- #2. This really cant be stressed enough, since a common problem is
- loading 200MB data into HBase then wondering why your awesome 10
- node cluster is mostly idle.
+ On the other hand, high region count has been known to make things slow.
+ This is getting better with each release of HBase, but it is probably better to have
+ 700 regions than 3000 for the same amount of data.
@@ -1599,10 +1588,12 @@
- Its probably best to stick to the default, perhaps going smaller
- for hot tables (or manually split hot regions to spread the load over
- the cluster), or go with a 1GB region size if your cell sizes tend to be
+ When starting off, its probably best to stick to the default region-size, perhaps going
+ smaller for hot tables (or manually split hot regions to spread the load over
+ the cluster), or go with larger region sizes if your cell sizes tend to be
largish (100k and up).
+ See for more information on configuration.
+
Index: src/docbkx/troubleshooting.xml
===================================================================
--- src/docbkx/troubleshooting.xml (revision 1206321)
+++ src/docbkx/troubleshooting.xml (working copy)
@@ -574,6 +574,18 @@
+
+ Network
+
+ Network Spikes
+ If you are seeing periodic network spikes you might want to check the compactionQueues to see if major
+ compactions are happening.
+
+ See for more information on managing compactions.
+
+
+
+
RegionServerFor more information on the RegionServers, see .
Index: src/docbkx/configuration.xml
===================================================================
--- src/docbkx/configuration.xml (revision 1206321)
+++ src/docbkx/configuration.xml (working copy)
@@ -1028,6 +1028,11 @@
throughput is affected since every request that hits that region server will take longer,
which exacerbates the problem even more.
+ You can get a sense of whether you have too little or too many handlers by
+
+ on an individual RegionServer then tailing its logs (Queued requests
+ consume memory).
+ Configuration for large memory machines
@@ -1054,11 +1059,20 @@
Consider going to larger regions to cut down on the total number of regions
on your cluster. Generally less Regions to manage makes for a smoother running
cluster (You can always later manually split the big Regions should one prove
- hot and you want to spread the request load over the cluster). By default,
- regions are 256MB in size. You could run with
- 1G. Some run with even larger regions; 4G or even larger. Adjust
- hbase.hregion.max.filesize in your hbase-site.xml.
+ hot and you want to spread the request load over the cluster). A lower number of regions is
+ preferred, generally in the range of 20 to low-hundreds
+ per RegionServer. Adjust the regionsize as appropriate to achieve this number.
+
+ For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
+ For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
+
+ You may need to experiment with this setting based on your hardware configuration and application needs.
+
+ Adjust hbase.hregion.max.filesize in your hbase-site.xml.
+ RegionSize can also be set on a per-table basis via
+ HTableDescriptor.
+
Managed Splitting