Details
-
Improvement
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
When multiple file system partitions are configured for the data storage of a data node,
it uses a strict round robin policy to decide which partition to use for writing the next block.
This may result in anormaly cases in which the blocks of a file are not evenly distributed across
the partitions. For example, when we use distcp to copy files with each node have 4 mappers running concurrently,
those 4 mappers are writing to DFS at about the same rate. Thus, it is possible that the 4 mappers write out
blocks interleavingly. If there are 4 file system partitions configured for the local data node, it is possible that each mapper will
continue to write its blocks on to the same file system partition.
A simple random placement policy will avoid such anormaly cases, and does not have any obvious drawbacks.
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-2559 DFS should place one replica per rack
- Closed
- is related to
-
HADOOP-2437 final map output not evenly distributed across multiple disks
- Closed