From 5125e8ecf9fab4848bd90c409b8ae07c7fa5f999 Mon Sep 17 00:00:00 2001 From: Misty Stanley-Jones Date: Mon, 3 Nov 2014 14:28:31 +1000 Subject: [PATCH] HBASE-12409 Add actual tunable parameters to regions per RS calculations --- src/main/docbkx/ops_mgt.xml | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index cd6562f..56b171a 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -2132,22 +2132,31 @@ hbase> restore_snapshot 'myTableSnapshot-122112' xml:id="ops.capacity.regions.count"> Number of regions per RS - upper bound In production scenarios, where you have a lot of data, you are normally concerned with - the maximum number of regions you can have per server. has technical discussion on the subject; in short, maximum - number of regions is mostly determined by memstore memory usage. Each region has its own - memstores; these grow up to a configurable size; usually in 128-256Mb range, see . There's one memstore per column family - (so there's only one per region if there's one CF in the table). RS dedicates some - fraction of total memory (see ) to region memstores. If this - memory is exceeded (too much memstore usage), undesirable consequences such as - unresponsive server, or later compaction storms, can result. Thus, a good starting point - for the number of regions per RS (assuming one table) is: - - (RS memory)*(total memstore fraction)/((memstore size)*(# column families)) - E.g. if RS has 16Gb RAM, with default settings, it is 16384*0.4/128 ~ 51 regions per - RS is a starting point. The formula can be extended to multiple tables; if they all have - the same configuration, just use total number of families. + the maximum number of regions you can have per server. + has technical discussion on the subject. Basically, the maximum number of regions is + mostly determined by memstore memory usage. Each region has its own memstores; these grow + up to a configurable size; usually in 128-256 MB range, see . One memstore exists per column family (so + there's only one per region if there's one CF in the table). The RS dedicates some + fraction of total memory to its memstores (see ). If this memory is exceeded (too + much memstore usage), it can cause undesirable consequences such as unresponsive server or + compaction storms. A good starting point for the number of regions per RS (assuming one + table) is: + + ((RS memory) * (total memstore fraction)) / ((memstore size)*(# column families)) + This formula is pseudo-code. Here are two formulas using the actual tunable + parameters, first for HBase 0.98+ and second for HBase 0.94.x. + + HBase 0.98.x:((RS Xmx) * hbase.regionserver.global.memstore.size) / + (hbase.hregion.memstore.flush.size * (# column families)) + HBase 0.94.x:((RS Xmx) * hbase.regionserver.global.memstore.upperLimit) / + (hbase.hregion.memstore.flush.size * (# column families)) + + If a given RegionServer has 16 GB of RAM, with default settings, the formula works out + to 16384*0.4/128 ~ 51 regions per RS is a starting point. The formula can be extended to + multiple tables; if they all have the same configuration, just use the total number of + families. This number can be adjusted; the formula above assumes all your regions are filled at approximately the same rate. If only a fraction of your regions are going to be actively written to, you can divide the result by that fraction to get a larger region count. Then, -- 2.1.2