Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I’m proposing a somewhat major new tool for quickly and efficiently alleviating latency pains due to locality. This is especially useful in cloud environments, and has been highly impactful at HubSpot, where we run thousands of RegionServers across 40+ multi-zone clusters. Please see the attached design doc for details on the problem, why compactions are not enough to solve the problem, and an overview (with diagram) of the components that make up this new tool.
As spec'd, this new feature would require submission of a new tool in the HDFS project. Once we reach consensus on the approach I can create the relevant upstream HDFS JIRA.
See the design doc here: https://docs.google.com/document/d/1GLGzrF1QLyhyOCr2fFw0LCymnyFPT0ktShTaaXn-75A/edit#heading=h.aswo7shg76b6
Note: This issue is an attempt to upstream a tool that has been fully deployed for all clusters in production at HubSpot for about 6 months. It's been very effective for us as currently implemented, but will need to be re-organized and re-designed a bit to fit into the HBase/HDFS projects. As such I'd like feedback on the design before putting in too much effort on porting multiple components into PRs.
Attachments
Issue Links
- is related to
-
HDFS-16261 Configurable grace period around invalidation of replaced blocks
- Open
-
HDFS-16155 Allow configurable exponential backoff in DFSInputStream refetchLocations
- Open
-
HDFS-16262 Async refresh of cached locations in DFSInputStream
- Resolved
- links to