diff --git src/main/docbkx/schema_design.xml src/main/docbkx/schema_design.xml index de05c14..2e288f1 100644 --- src/main/docbkx/schema_design.xml +++ src/main/docbkx/schema_design.xml @@ -99,6 +99,35 @@ admin.enableTable(table);
Rowkey Design +
+ Hotspotting + Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, + allowing you to store related rows, or rows that will be read together, near each other. + However, poorly designed row keys are a common source of hotspotting. + Hotspotting occurs when a large amount of client traffic is directed at one node, or only a + few nodes, of a cluster. This traffic may represent reads, writes, or other operations. The + traffic overwhelms the single machine responsible for hosting that region, causing + performance degradation and potentially leading to region unavailability. This can also have + adverse effects on other regions hosted by the same region server as that host is unable to + service the requested load. It is important to design data access patterns such that the + cluster is fully and evenly utilized. + To prevent hotspotting on writes, design your row keys such that rows that truly do need + to be in the same region are, but in the bigger picture, data is being written to multiple + regions across the cluster, rather than one at a time. One technique is to + salt row keys which would otherwise be sequential. In this case, + salting refers to adding a prefix to the row key to cause it to sort differently than it + otherwise would. One example of salting row keys to prevent hotspotting can be found at + http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/. + Another is at https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables. + You could even salt with a random value, for data which will be accessed randomly. However, + using totally random row keys for data which is accessed sequentially would remove the + benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or + scan would need to query all regions. +
Monotonically Increasing Row Keys/Timeseries Data