diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc index 6de7208..1334771 100644 --- a/src/main/asciidoc/_chapters/architecture.adoc +++ b/src/main/asciidoc/_chapters/architecture.adoc @@ -1353,6 +1353,27 @@ Splits run unaided on the RegionServer; i.e. the Master does not participate. The RegionServer splits a region, offlines the split region and then adds the daughter regions to `hbase:meta`, opens daughters on the parent's hosting RegionServer and then reports the split to the Master. See <> for how to manually manage splits (and for why you might do this). +==== Split Policies +HBase includes five different region split policies. In addition, you can create your own split policy. See <>. To configure a split policy globally or for a given table, see <>. + +.Included Split Policies +* link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy] -- the default split policy since HBase 0.94. Split regions based upon the size of the store files, but split more aggressively depending on the number of regions of the same table hosted on a given RegionServer. +* link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html[KeyPrefixRegionSplitPolicy] -- Extends link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy]. You can determine a given prefix length, and rows that share the same prefix up to that length are always kept together. +* link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.html[DelimitedKeyPrefixRegionSplitPolicy] -- Extends link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy]. If your row-keys are delimited, for instance, with underscores as in `userid_eventtype_eventid`, this split policy ensures that all rows starting with the same `userid` are grouped together during splits. +* link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html[ConstantSizeRegionSplitPolicy] -- the default split policy in HBase 0.94 and earlier. Split regions based only upon the size of the store file. + +* link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.html[DisabledRegionSplitPolicy] -- disable splitting for this table. Not recommended! + +.Choosing a Split Policy +To choose a split policy globally or for a given table, it is important to consider the characteristics of your data, the pattern of the row keys, and the patterns you use to access the data. The following questions may be helpful when deciding on a region split policy. You may even use these questions to decide on a schema for your row keys. + +* Are your row keys "chunked" by common prefixes that are useful when scanning? Consider link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html[KeyPrefixRegionSplitPolicy]. +* Are your row keys delimited by specific patterns that are useful when scanning? Consider link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.html[DelimitedKeyPrefixRegionSplitPolicy]. +* Is it more important to control the size of your regions (link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy]), the number of rows in a region (link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.html[DelimitedKeyPrefixRegionSplitPolicy]), or the overall size of your store files (link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html[ConstantSizeRegionSplitPolicy])? +* For a given table, do different columns hold cells of radically different sizes? Consider ink:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy]. +* Do your needs fall outside the scope of any of the existing region split policies? In this case, consider implementing your own <>. + +[[region.split.policies.custom]] ==== Custom Split Policies ou can override the default split policy using a custom link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy](HBase 0.94+). Typically a custom split policy should extend HBase's default split policy: link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy]. @@ -1360,6 +1381,8 @@ HBase's default split policy: link:http://hbase.apache.org/apidocs/org/apache/ha The policy can set globally through the HBase configuration or on a per-table basis. +[[region.split.policy.configure]] +==== Configuring a Split Policy .Configuring the Split Policy Globally in _hbase-site.xml_ [source,xml] ----