Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1554

Persistent resources support for storage-like services

    Details

    • Type: Epic
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fetcher
    • Epic Name:
      Persistence

      Description

      This question came up in dev mailing list.
      It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector.

      At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. /var/lib/cassandra).

      Benjamin Hindman suggested an idea in the mailing list (though it still needs some fleshing out):

      The idea originally came about because, even today, if we allocate some
      file system space to a task/executor, and then that task/executor
      terminates, we haven't officially "freed" those file system resources until
      after we garbage collect the task/executor sandbox! (We keep the sandbox
      around so a user/operator can get the stdout/stderr or anything else left
      around from their task/executor.)

      To solve this problem we wanted to be able to let a task/executor terminate
      but not give up all of it's resources, hence: persistent resources.

      Pushing this concept even further you could imagine always reallocating
      resources to a framework that had already been allocated those resources
      for a previous task/executor. Looked at from another perspective, these are
      "late-binding", or "lazy", resource reservations.

      At one point in time we had considered just doing 'right-of-first-refusal'
      for allocations after a task/executor terminate. But this is really
      insufficient for supporting storage-like frameworks well (and likely even
      harder to reliably implement then 'persistent resources' IMHO).

      There are a ton of things that need to get worked out in this model,
      including (but not limited to), how should a file system (or disk) be
      exposed in order to be made persistent? How should persistent resources be
      returned to a master? How many persistent resources can a framework get
      allocated?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mcypark Michael Park
                Reporter:
                nekto0n Nikita Vetoshkin
              • Votes:
                37 Vote for this issue
                Watchers:
                108 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: