[HBASE-26304] Reflect out-of-band locality improvements in served requests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 3.0.0-alpha-2
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

Edit: Description updated to avoid needing to read the full investigation laid out in the comments.

Once the LocalityHealer has improved locality of a StoreFile (by moving blocks onto the correct host), the Reader's DFSInputStream and Region's localityIndex metric must be refreshed. Without refreshing the DFSInputStream, the improved locality will not improve latencies. In fact, the DFSInputStream may try to fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is automatically retried, but the retry will temporarily increase long tail latencies relative to configured backoff strategy.

In the original LocalityHealer design, I created a new RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list of region names and, for each region store, re-opens the underlying StoreFile if the locality has changed. This implementation was complicated both in integrating callbacks into the HDFS Dispatcher and in terms of safely re-opening StoreFiles without impacting reads or caches.

In working to port the LocalityHealer to the Apache projects, I'm taking a different approach:

The part of the LocalityHealer that moves blocks will be an HDFS project contribution
As such, the DFSClient should be able to more gracefully recover from block moves.
Additionally, HBase has some caches of block locations for locality reporting and the balancer. Those need to be kept up-to-date.

The DFSClient improvements are covered in HDFS-16261 and ~~HDFS-16262~~. As such, this issue becomes about updating HBase's block location caches.

I considered a few different approaches, but the most elegant one I could come up with was to tie the HDFSBlockDistribution metrics directly to the underlying DFSInputStream of each StoreFile's initialReader. That way, our locality metrics are identically representing the block allocations that our reads are going through. This also means that our locality metrics will naturally adjust as the DFSInputStream adjusts to block moves.

Once we have accurate locality metrics on the regionserver, the Balancer's cache can easily be invalidated via our usual heartbeat methods. RegionServers report to the HMaster periodically, which keeps a ClusterMetrics method up to date. Right before each balancer invocation, the balancer is updated with the latest ClusterMetrics. At this time, we compare the old ClusterMetrics to the new, and invalidate the caches for any regions whose locality has changed.

Attachments

Issue Links

links to

GitHub Pull Request #3704

GitHub Pull Request #3803

GitHub Pull Request #3895

Activity

People

Assignee:: Bryan Beaudreault

Reporter:: Bryan Beaudreault

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 28/Sep/21 13:26

Updated:: 07/Mar/22 15:11

Resolved:: 04/Dec/21 07:37