[HDFS-325] DFS should not use round robin policy in determing on which volume (file system partition) to allocate for the next block - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

When multiple file system partitions are configured for the data storage of a data node,
it uses a strict round robin policy to decide which partition to use for writing the next block.
This may result in anormaly cases in which the blocks of a file are not evenly distributed across
the partitions. For example, when we use distcp to copy files with each node have 4 mappers running concurrently,
those 4 mappers are writing to DFS at about the same rate. Thus, it is possible that the 4 mappers write out
blocks interleavingly. If there are 4 file system partitions configured for the local data node, it is possible that each mapper will
continue to write its blocks on to the same file system partition.

A simple random placement policy will avoid such anormaly cases, and does not have any obvious drawbacks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

randomDatanodePartition.patch
08/Jan/08 23:41
0.7 kB
Dhruba Borthakur

Issue Links

depends upon

HADOOP-2559 DFS should place one replica per rack

Closed

is related to

HADOOP-2437 final map output not evenly distributed across multiple disks

Closed

relates to

HDFS-738 Improve the disk utilization of HDFS

Open

HDFS-284 dfs.data.dir syntax needs revamping: multiple percentages and weights

Resolved

Activity

People

Assignee:: Dhruba Borthakur

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Oct/07 17:53

Updated:: 27/Oct/09 19:30