[MESOS-3070] Master CHECK failure if a framework uses duplicated task id. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.22.1
Fix Version/s: None
Component/s: master
Labels:
None

Description

We observed this in one of our testing cluster.

One framework (under development) keeps launching tasks using the same task_id. We don't expect the master to crash even if the framework is not doing what it's supposed to do. However, under a series of events, this could happen and keeps crashing the master.

1) frameworkA launches task 'task_id_1' on slaveA
2) master fails over
3) slaveA has not re-registered yet
4) frameworkA re-registered and launches task 'task_id_1' on slaveB
5) slaveA re-registering and add task "task_id_1' to frameworkA
6) CHECK failure in addTask

I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with resources cpus(*):4; mem(*):32768 on slave 20150417-232509-1735470090-5050-48870-S25 (hostname)
...
...
F0716 21:52:50.760136 28805 master.hpp:362] Check failed: !tasks.contains(task->task_id()) Duplicate task 'task_id_1' of framework <framework_id>

Attachments

Issue Links

Blocked

MESOS-8353 Duplicate task for same framework on multiple agents crashes out master after failover

Resolved

is blocked by

MESOS-3351 duplicated slave id in master after master failover

Resolved

is duplicated by

MESOS-6785 CHECK failure on duplicate task IDs

Resolved

is related to

MESOS-6805 Check unreachable task cache for task ID collisions on launch

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Jie Yu

Shepherd:: Vinod Kone

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Jul/15 23:13

Updated:: 10/Jan/18 19:12