[MAPREDUCE-1204] Fair Scheduler preemption may preempt tasks running in slots unusable by the preempting job - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.21.0
Fix Version/s: None
Component/s: contrib/fair-share
Labels:
None

Description

The current preemption code works by first calculating how many tasks need to be preempted to satisfy the min share constraints, and then killing an equal number of tasks from other jobs, sorted to favor killing of young tasks. This works fine for the general case, but there are some edge cases where this can cause problems.

For example, if the preempting job has blacklisted ("marked flaky") a particular task tracker, and that tracker is running the youngest task, preemption can still kill that task. The preempting job will then refuse that slot, since the tracker has been blacklisted. The same task that just got killed then gets rescheduled in that slot. This repeats ad infinitum until a new slot opens in the cluster.

I don't have a good test case for this, yet, but logically it is possible.

One potential fix would be to add an API to JobInProgress that functions identically to obtainNewMapTask but does not schedule the task. The preemption code could then use this while iterating through the sorted preemption list to check that the preempting jobs can actually make use of the candidate slots before killing them.

Attachments

Issue Links

is related to

MAPREDUCE-2205 FairScheduler should not re-schedule jobs that have just been preempted

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 10/Nov/09 22:40

Updated:: 13/Dec/10 23:31