[MESOS-5950] Consider request/response for reconciliation, bulk reconcile - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: master, scheduler api
Labels:
- mesosphere
- reconciliation

Description

The current task reconciliation API has a few quirks:

1. The master will sometimes use "send nothing" as a way to communicate information (MESOS-4050), which is very confusing in a distributed system that might drop messages for other reasons.
2. A framework has no way to determine when the reconciliation results for a given reconciliation request are "complete". That is, when a framework sends a reconciliation request, it starts to receive zero or more task status updates (with reason set to REASON_RECONCILIATION). The framework can't easily determine how many results it should expect to receive.
3. For efficiency (and perhaps to simplify framework logic), it might be easier to send a batch of task status updates together in a single message, rather than sending potentially tens of thousands of individual messages.

For #2, arguably a framework shouldn't need to know when it has seen the "complete" set of results for a reconciliation request. However, supporting a "request/reply" structure for reconciliation can simplify framework logic, especially if a framework might have multiple timers/reasons to be doing reconciliation at the same time.

Attachments

Issue Links

is related to

MESOS-4050 Change task reconciliation not omit unknown tasks

Accepted

relates to

MESOS-2308 Task reconciliation API should support data partitioning

Open

MESOS-6311 Consider supporting implicit reconciliation per agent

Open

MESOS-2456 Add ability to reconcile only unknown tasks

Open

Activity

People

Assignee:: Unassigned

Reporter:: Neil Conway

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Aug/16 10:02

Updated:: 26/Apr/17 16:51