Details
-
New Feature
-
Status: Closed
-
Critical
-
Resolution: Won't Fix
-
0.94.0
-
None
-
None
-
None
Description
The feature as is only useful for HBase clusters that care about data locality on regionservers, but this feature can also enable a lot of nice features down the road.
The basic idea is as follows: instead of letting HDFS determine where to replicate data (r=3) by place blocks on various regions, it is better to let HBase do so by providing hints to HDFS through the DFS client. That way instead of replicating data at a blocks level, we can replicate data at a per-region level (each region owned by a promary, a secondary and a tertiary regionserver). This is better for 2 things:
- Can make region failover faster on clusters which benefit from data affinity
- On large clusters with random block placement policy, this helps reduce the probability of data loss
The algo is as follows:
- Each region in META will have 3 columns which are the preferred regionservers for that region (primary, secondary and tertiary)
- Preferred assignment can be controlled by a config knob
- Upon cluster start, HMaster will enter a mapping from each region to 3 regionservers (random hash, could use current locality, etc)
- The load balancer would assign out regions preferring region assignments to primary over secondary over tertiary over any other node
- Periodically (say weekly, configurable) the HMaster would run a locality checked and make sure the map it has for region to regionservers is optimal.
Down the road, this can be enhanced to control region placement in the following cases:
- Mixed hardware SKU where some regionservers can hold fewer regions
- Load balancing across tables where we dont want multiple regions of a table to get assigned to the same regionservers
- Multi-tenancy, where we can restrict the assignment of the regions of some table to a subset of regionservers, so an abusive app cannot take down the whole HBase cluster.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-4606 HDFS API to move file replicas to caller's location
- Open
-
HDFS-2576 Namenode should have a favored nodes hint to enable clients to have control over block placement.
- Closed
-
HDFS-6133 Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover.
- Closed
-
HDFS-6441 Add ability to exclude/include specific datanodes while balancing
- Closed
- is required by
-
HBASE-5843 Improve HBase MTTR - Mean Time To Recover
- Closed
- is superceded by
-
HBASE-15531 Favored Nodes Enhancements
- Closed
- relates to
-
HBASE-6572 Tiered HFile storage
- Closed
- requires
-
HBASE-8549 Integrate Favored Nodes into StochasticLoadBalancer
- Closed
-
HBASE-9116 Add a view/edit tool for favored node mappings for regions
- Closed