[MESOS-6608] Do not transition tasks to TASK_KILLED on framework teardown - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: master
Labels:
- foundations
- mesosphere

Epic Link:
Mesos Agent Lifecycle

Description

When a framework is torn down or disconnects, we currently transition the framework's tasks to state TASK_KILLED at the master. See

This happens at the master; concurrently, the master sends a ShutdownFrameworkMessage to each agent that is running one of the framework's tasks.

Marking the task KILLED in this manner is problematic for two reasons:

The task is still running and may continue running for an unbounded length of time if the agent becomes partitioned.
KILLED is usually used to denote tasks that are killed in response to a "kill task" operation.

My primary concern here is #1. We could pick a different terminal state to address #2 but I think that is secondary: transitioning the task to any terminal state before it has been terminated is problematic, in my view.

Proposed behavior: when the framework teardown is applied, we keep the task in its current state at the master. Then when the agent receives the ShutdownFrameworkMessage, it can shutdown the task and eventually respond with a terminal status update. At that point we can transition the task into the appropriate terminal state (whether it be KILLED, FAILED, GONE, or a new state).

This will probably require some changes to the status update machinery, since we currently drop status updates for terminating frameworks at the slave. Since the scheduler is gone, we'd need to have the master ack the status update rather than the framework.

Attachments

Issue Links

is related to

MESOS-6602 Shutdown completed frameworks when unreachable agent re-registers

Resolved

MESOS-1736 Completed tasks shown as running

Resolved

relates to

MESOS-10194 Mesos master failure "Check failed: 'get_(role)' Must be SOME"

Accepted

Activity

People

Assignee:: Unassigned

Reporter:: Neil Conway

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Nov/16 20:30

Updated:: 13/Nov/20 15:03