HBase
  1. HBase
  2. HBASE-4755

HBase based block placement in DFS

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.94.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The feature as is only useful for HBase clusters that care about data locality on regionservers, but this feature can also enable a lot of nice features down the road.

      The basic idea is as follows: instead of letting HDFS determine where to replicate data (r=3) by place blocks on various regions, it is better to let HBase do so by providing hints to HDFS through the DFS client. That way instead of replicating data at a blocks level, we can replicate data at a per-region level (each region owned by a promary, a secondary and a tertiary regionserver). This is better for 2 things:

      • Can make region failover faster on clusters which benefit from data affinity
      • On large clusters with random block placement policy, this helps reduce the probability of data loss

      The algo is as follows:

      • Each region in META will have 3 columns which are the preferred regionservers for that region (primary, secondary and tertiary)
      • Preferred assignment can be controlled by a config knob
      • Upon cluster start, HMaster will enter a mapping from each region to 3 regionservers (random hash, could use current locality, etc)
      • The load balancer would assign out regions preferring region assignments to primary over secondary over tertiary over any other node
      • Periodically (say weekly, configurable) the HMaster would run a locality checked and make sure the map it has for region to regionservers is optimal.

      Down the road, this can be enhanced to control region placement in the following cases:

      • Mixed hardware SKU where some regionservers can hold fewer regions
      • Load balancing across tables where we dont want multiple regions of a table to get assigned to the same regionservers
      • Multi-tenancy, where we can restrict the assignment of the regions of some table to a subset of regionservers, so an abusive app cannot take down the whole HBase cluster.
      1. 4755-wip-1.patch
        131 kB
        Devaraj Das
      2. hbase-4755-notes.txt
        3 kB
        Devaraj Das

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Christopher Gist
              Reporter:
              Karthik Ranganathan
            • Votes:
              1 Vote for this issue
              Watchers:
              45 Start watching this issue

              Dates

              • Created:
                Updated:

                Development