Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
0.23.0
-
None
-
None
-
None
Description
When the HDFS DataNode stores chunks in a local directory, it currently puts all of the chunk files into either one big directory, or a collection of directories. However, there is no way to know which directory a given block will end up in, given its ID. As the number of files increases, this does not scale well.
Similar to the git version control system, HDFS should create a few different top level directories keyed off of a few bits in the chunk ID. Git uses 8 bits. This substantially cuts down on the number of chunk files in the same directory and gives increased performance, while not compromising O(1) lookup of chunks.
Attachments
Issue Links
- duplicates
-
HDFS-6482 Use block ID-based block layout on datanodes
- Closed