diff --git src/main/docbkx/book.xml src/main/docbkx/book.xml index 19dd770..0c2b2c1 100644 --- src/main/docbkx/book.xml +++ src/main/docbkx/book.xml @@ -3081,6 +3081,88 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName( +
+ Manual Region Splitting + It is possible to manually split your table into regions, either at table creation + (pre-splitting), or at a later time by altering the table. You might choose to split your + region for one or more of the following reasons. There may be other valid reasons, but the + need to manually split your table might also point to problems with your schema + design. + + Reasons to Manually Split Your Table + + Your data is sorted by timeseries or another similar algorithm that sorts new data + at the end of the table. This means that the Region Server holding the last region is + always under load, and the other Region Servers are idle, or mostly idle. + + + You have developed an unexpected hotspot in one region of your table. For + instance, an application which tracks web searches might be inundated by a lot of + searches for a celebrity in the event of news about that celebrity. + + + After a big increase to the number of Region Servers in your cluster, to get the + load spread out quickly. + + + Before a bulk-load which is likely to cause unusual and uneven load across + regions. + + +
+ Determining Split Points + The goal of splitting your table manually is to improve the chances of balancing the + load across the cluster in situations where good rowkey design alone won't get you + there. Keeping that in mind, the way you split your regions is very dependent upon the + characteristics of your data. It may be that you already know the best way to split your + table. If not, the way you split your table depends on what your keys are like. + + + Alphanumeric Rowkeys + + If your rowkeys start with a letter or number, you can split your table at + letter or number boundaries. For instance, the following command creates a table + with regions that split at each vowel, so the first region has A-D, the second + region has E-H, the third region has I-N, the fourth region has O-V, and the fifth + region has U-Z. + hbase> create 'test_table', 'f1', SPLITS=> ['a', 'e', 'i', 'o', 'u'] + The following command splits an existing table at split point '2'. + hbase> split 'test_table', '2' + You can also split a specific region by referring to its ID. You can find the + region ID by looking at either the table or region in the Web UI. It will be a + long number such as + t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.. The + format is table_name,start_key,region_idTo split that + region into two, as close to equally as possible (at the nearest row boundary), + issue the following command. + hbase> split 't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.' + The split key is optional. If it is omitted, the table or region is split in + half. + The following example shows how to use the RegionSplitter to create 10 + regions, split at hexadecimal values. + hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1 + + + + Using a Custom Algorithm + + The RegionSplitter tool is provided with HBase, and uses a SplitAlgorithm to determine split points for you. As + parameters, you give it the algorithm, desired number of regions, and column + families. It includes two split algorithms. The first is the HexStringSplit algorithm, which assumes the row keys are + hexadecimal strings. The second, UniformSplit, assumes the row keys are random byte arrays. You will + probably need to develop your own SplitAlgorithm, using the provided ones as + models. + + + +
+
Online Region Merges