Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-325

DFS should not use round robin policy in determing on which volume (file system partition) to allocate for the next block

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When multiple file system partitions are configured for the data storage of a data node,
      it uses a strict round robin policy to decide which partition to use for writing the next block.
      This may result in anormaly cases in which the blocks of a file are not evenly distributed across
      the partitions. For example, when we use distcp to copy files with each node have 4 mappers running concurrently,
      those 4 mappers are writing to DFS at about the same rate. Thus, it is possible that the 4 mappers write out
      blocks interleavingly. If there are 4 file system partitions configured for the local data node, it is possible that each mapper will
      continue to write its blocks on to the same file system partition.

      A simple random placement policy will avoid such anormaly cases, and does not have any obvious drawbacks.

      Attachments

        1. randomDatanodePartition.patch
          0.7 kB
          Dhruba Borthakur

        Issue Links

          Activity

            People

              dhruba Dhruba Borthakur
              runping Runping Qi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: