Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1443 Improve Datanode startup time
  3. HDFS-1446

Refactor the start-time Directory Tree and Replicas Map constructors to share data and run volume-parallel

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2
    • None
    • datanode
    • None
    • datanode startup, volume parallel

    Description

      Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share data and run volume-parallel. Currently the two constructors for in-memory directory tree and replicas map run THREE full scans of the entire disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in addToReplicasMap(). During each scan, a new File object is created for each of the 100,000 or so items in the native file system (for a 50,000-block node). This impacts GC as well as disk traffic.

      This work item is one of four sub-tasks for HDFS-1443, Improve Datanode startup time.

      Attachments

        Activity

          People

            mattf Matthew Foley
            mattf Matthew Foley
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: