Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4894

Volumes, reservations can move to new agent IDs after partition

    XMLWordPrintableJSON

Details

    Description

      If an agent fails health checks, it is removed from the cluster. The next time the agent connects to the master, it is instructed to shutdown and all tasks/executors are killed. The next time the agent is started, it will be assigned a new agent ID. Any persistent volumes from the previous agent instance will be preserved, but they will now be associated with a new agent ID.

      This is problematic because volume IDs do not need to be globally unique. Hence, it is natural for frameworks to use the pair <agent-id, volume-id> to uniquely identify a volume. If volume k moves from agent foo to agent bar, it is hard for frameworks to determine whether <bar,k> is the "same" volume that was previously called <foo,k> (they might be able to figure this out from `slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP endpoints for volumes and dynamic reservations include a slave ID.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              neilc Neil Conway
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: