Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.22.1
-
None
-
None
Description
We observed this in one of our testing cluster.
One framework (under development) keeps launching tasks using the same task_id. We don't expect the master to crash even if the framework is not doing what it's supposed to do. However, under a series of events, this could happen and keeps crashing the master.
1) frameworkA launches task 'task_id_1' on slaveA
2) master fails over
3) slaveA has not re-registered yet
4) frameworkA re-registered and launches task 'task_id_1' on slaveB
5) slaveA re-registering and add task "task_id_1' to frameworkA
6) CHECK failure in addTask
I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with resources cpus(*):4; mem(*):32768 on slave 20150417-232509-1735470090-5050-48870-S25 (hostname) ... ... F0716 21:52:50.760136 28805 master.hpp:362] Check failed: !tasks.contains(task->task_id()) Duplicate task 'task_id_1' of framework <framework_id>
Attachments
Issue Links
- Blocked
-
MESOS-8353 Duplicate task for same framework on multiple agents crashes out master after failover
- Resolved
- is blocked by
-
MESOS-3351 duplicated slave id in master after master failover
- Resolved
- is duplicated by
-
MESOS-6785 CHECK failure on duplicate task IDs
- Resolved
- is related to
-
MESOS-6805 Check unreachable task cache for task ID collisions on launch
- Resolved