Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1443

Improve Datanode startup time



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2
    • None
    • datanode
    • None
    • datanode startup, volume parallel, hard links


      One of the factors slowing down cluster restart is the startup time for the Datanodes. In particular, if Upgrade is needed, the Datanodes must do a Snapshot and this can take 5-15 minutes per volume, serially. Thus, for a 4-disk datanode, it may be 45 minutes before it is ready to send its initial Block Report to the Namenode. This is an umbrella bug for the following four pieces of work to improve Datanode startup time:

      1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file. This is the biggest villain, responsible for 90% of that 45 minute delay. See subordinate bug for details.

      2. Refactor Upgrade process in DataStorage to run volume-parallel. There is already a bug open for this, HDFS-270, and the volume-parallel work in DirectoryScanner from HDFS-854 is a good foundation to build on.

      3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share data and run volume-parallel. Currently the two constructors for in-memory directory tree and replicas map run THREE full scans of the entire disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in addToReplicasMap(). During each scan, a new File object is created for each of the 100,000 or so items in the native file system (for a 50,000-block node). This impacts GC as well as disk traffic.

      4. Make getGenerationStampFromFile() more efficient. Currently this routine is called by addToReplicasMap() for every blockfile in the directory tree, and it walks the listing of each file's containing directory on every call. There is a simple refactoring that makes this unnecessary.




            mattf Matthew Foley
            mattf Matthew Foley
            0 Vote for this issue
            14 Start watching this issue