[MESOS-4050] Change task reconciliation not omit unknown tasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Accepted
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: framework, master
Labels:
- mesosphere
- reconciliation

Description

If the master fails over and a framework tries to do an explicit reconciliation for a task running on an agent that has not reregistered yet (and agent_reregister_timeout has not been exceeded), the master will not send a reconciliation response for that task.

This is confusing for framework authors. It seems better for the master to announce all the information it has explicitly: e.g., to return "task X is in an unknown state", rather than not returning anything. Then as more information arrives (e.g., agent reregisters or task definitively dies), task state would transition appropriately. We might want to do this via a new task state, e.g., TASK_REREGISTER_PENDING.

This might be consistent with changing the task states so that we capture "task is partitioned" as an explicit task state (TASK_UNKNOWN or TASK_WANDERING) – see ~~MESOS-4049~~.

Attachments

Issue Links

is duplicated by

MESOS-6250 Ensure valid task state before connecting with framework on master failover

Resolved

is related to

MESOS-4049 Allow user to control behavior of partitioned agents/tasks

Resolved

relates to

MESOS-5950 Consider request/response for reconciliation, bulk reconcile

Open

Activity

People

Assignee:: Unassigned

Reporter:: Neil Conway

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 02/Dec/15 20:51

Updated:: 26/Nov/18 13:36