[HDFS-738] Improve the disk utilization of HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: datanode
Labels:
None

Description

HDFS data node currently assigns writers to disks randomly. This is good if there are a large number of readers/writers on a single data node, but might create a lot of contentions if there are only 4 readers/writers on a 4-disk node.

A better way is to introduce a base class DiskHandler, for registering all disk operations (read/write), as well as getting the best disk for writing new blocks. A good strategy of the DiskHandler would be to distribute the load of the writes to the disks with more free spaces as well as less recent activities. There can be many strategies.

This could help improve the HDFS multi-threaded write throughput a lot - we are seeing <25MB/s/disk on a 4-disk/node 4-node cluster (replication is already considered) given 8 concurrent writers (24 writers considering replication). I believe we can improve that to 2x.

Attachments

Issue Links

is related to

HDFS-325 DFS should not use round robin policy in determing on which volume (file system partition) to allocate for the next block

Reopened

Activity

People

Assignee:: Unassigned

Reporter:: Zheng Shao

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 27/Oct/09 00:49

Updated:: 27/Oct/09 19:30