diff --git src/main/docbkx/schema_design.xml src/main/docbkx/schema_design.xml index de05c14..84c35e7 100644 --- src/main/docbkx/schema_design.xml +++ src/main/docbkx/schema_design.xml @@ -99,6 +99,23 @@ admin.enableTable(table);
Rowkey Design +
+ Hotspotting + Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, + allowing you to store related rows, or rows that will be read together, near each other. + HBase also attempts to store rows near each other in the same region, on the same region + server. This is referred to as locality. However, poorly designed row + keys can lead to hotspotting. Hotspotting occurs when nearly all the + rows being written to HBase are written to the same region, because their row keys are + contiguous or very similar. Hotspotting causes serious performance degredation, because one + region server is overloaded while the others are sitting idle. + To prevent hotspotting, design your row keys such that rows that truly do need to be in + the same region are, but in the bigger picture, data is being written to multiple regions + across the cluster, rather than one at a time. One technique is to salt the row keys. + However, using totally random row keys would remove any benefit of HBase's row-sorting + algorithm and cause very poor performance, as each get or scan would need to query all + regions. +
Monotonically Increasing Row Keys/Timeseries Data