[MESOS-4894] Volumes, reservations can move to new agent IDs after partition - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: agent
Labels:
- mesosphere
- persistent-volumes

Description

If an agent fails health checks, it is removed from the cluster. The next time the agent connects to the master, it is instructed to shutdown and all tasks/executors are killed. The next time the agent is started, it will be assigned a new agent ID. Any persistent volumes from the previous agent instance will be preserved, but they will now be associated with a new agent ID.

This is problematic because volume IDs do not need to be globally unique. Hence, it is natural for frameworks to use the pair <agent-id, volume-id> to uniquely identify a volume. If volume k moves from agent foo to agent bar, it is hard for frameworks to determine whether <bar,k> is the "same" volume that was previously called <foo,k> (they might be able to figure this out from `slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP endpoints for volumes and dynamic reservations include a slave ID.

Attachments

Issue Links

is related to

MESOS-4049 Allow user to control behavior of partitioned agents/tasks

Resolved

relates to

MESOS-5368 Consider introducing persistent agent ID

Open

Activity

People

Assignee:: Unassigned

Reporter:: Neil Conway

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Mar/16 00:21

Updated:: 12/May/16 10:40