Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3290

Use a better local directory layout for the datanode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 0.23.0
    • None
    • None
    • None

    Description

      When the HDFS DataNode stores chunks in a local directory, it currently puts all of the chunk files into either one big directory, or a collection of directories. However, there is no way to know which directory a given block will end up in, given its ID. As the number of files increases, this does not scale well.

      Similar to the git version control system, HDFS should create a few different top level directories keyed off of a few bits in the chunk ID. Git uses 8 bits. This substantially cuts down on the number of chunk files in the same directory and gives increased performance, while not compromising O(1) lookup of chunks.

      Attachments

        Issue Links

          Activity

            People

              cmccabe Colin McCabe
              cmccabe Colin McCabe
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: