Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
We have hit a problem with the preemption implementation in the FairScheduler where the following happens:
- job X runs short of fair share or min share and requests/causes N tasks to be preempted
- when slots are then scheduled - tasks from some other job are actually scheduled
- after preemption_interval has passed, job X finds it's still underscheduled and requests preemption. goto 1.
This has caused widespread preemption of tasks and the cluster going from high utilization to low utilization in a few minutes.
After doing some analysis of the logs - one of the biggest contributing factors seems to be the scheduling of jobs when a heartbeat with multiple slots is advertised. currently it goes over all the jobs/pools (in sorted) order until all the slots are exhausted. this leads to lower priority jobs also getting scheduled (that may have just been preempted).
Attachments
Issue Links
- relates to
-
MAPREDUCE-1204 Fair Scheduler preemption may preempt tasks running in slots unusable by the preempting job
- Open