[HADOOP-2829] JT should consider the disk each task is on before scheduling jobs... - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The DataNode can support a JBOD config, where blocks exist on explicit disks. But this information is not exported or considered by the JT when assigning tasks. This leads to non-optimal disk use. if 4 slots are used, 2 running tasks will likely be on the same disk and we observe them running more slowly then other tasks on the same machine.

We could follow a number of strategies to address this.

for example: The data nodes could support a what disk is this block on call. Then the JT could discover the info and assign jobs accordingly.

Of course the TT itself uses disks for merge and temp space and the datanodes on the same machine can be used by off node sources, so it is not clear optimizing all of this is simple enough to be worth it.

This issue deserves study.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Eric Baldeschwieler

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Feb/08 07:37

Updated:: 23/Feb/18 13:18

Resolved:: 23/Feb/18 13:18