Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.20.2
-
None
-
None
-
datanode startup, volume parallel
Description
Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share data and run volume-parallel. Currently the two constructors for in-memory directory tree and replicas map run THREE full scans of the entire disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in addToReplicasMap(). During each scan, a new File object is created for each of the 100,000 or so items in the native file system (for a 50,000-block node). This impacts GC as well as disk traffic.
This work item is one of four sub-tasks for HDFS-1443, Improve Datanode startup time.