Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-244

DFS should provide partition information for blocks, and map/reduce should schedule avoid schedule mappers with the splits off the same file system partition at the same time

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
      For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
      across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
      each work on splits on different file system partitions. If in the worst case,
      all the mappers work on the splits on the same file system partition, then the other three
      file systems are not utilized at all.

        Activity

        Hide
        eric14 eric baldeschwieler added a comment -

        An easier solution might simply be to schedule more blocks to be read at once. This will saturate the disk system with less complexity...

        Show
        eric14 eric baldeschwieler added a comment - An easier solution might simply be to schedule more blocks to be read at once. This will saturate the disk system with less complexity...

          People

          • Assignee:
            Unassigned
            Reporter:
            runping Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development