[MESOS-8411] Killing a queued task can lead to the command executor never terminating. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.1, 1.4.1, 1.5.0
Fix Version/s: 1.4.2, 1.5.1, 1.6.0
Component/s: agent
Labels:
None

Sprint:
Mesosphere Sprint 72, Mesosphere Sprint 73, Mesosphere Sprint 74
Story Points:
5

Description

If a task is killed while the executor is re-registering, we will remove it from queued tasks and shut down the executor if all the its initial tasks could not be delivered. However, there is a case (within Slave::___run) where we leave the executor running, the race is:

Command-executor task launched.
Command executor sends registration message. Agent tells containerizer to update the resources before it sends the tasks to the executor.
Kill arrives, and we synchronously remove the task from queued tasks.
Containerizer finishes updating the resources, and in Slave::___run the killed task is ignored.
Command executor stays running!

Executors could have a timeout to handle this case, but it's not clear that all executors will implement this correctly. It would be better to have a defensive policy that will shut down an executor if all of its initial batch of tasks were killed prior to delivery.

In order to implement this, one approach discussed with vinodkone is to look at the running + terminated but unacked + completed tasks, and if empty, shut the executor down in the Slave::___run path. This will require us to check that the completed task cache size is set to at least 1, and this also assumes that the completed tasks are not cleared based on time or during agent recovery.

Attachments

Issue Links

is related to

MESOS-8459 Executor could linger without ever receiving any tasks

Open

relates to

MESOS-5380 Killing a queued task can cause the corresponding command executor to never terminate.

Resolved

Activity

People

Assignee:: Meng Zhu

Reporter:: Benjamin Mahler

Shepherd:: Benjamin Mahler

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Jan/18 01:13

Updated:: 22/Mar/19 16:41

Resolved:: 13/Feb/18 06:04