Details
-
Epic
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
Persistence
Description
This question came up in dev mailing list.
It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector.
At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. /var/lib/cassandra).
benjaminhindman suggested an idea in the mailing list (though it still needs some fleshing out):
The idea originally came about because, even today, if we allocate some
file system space to a task/executor, and then that task/executor
terminates, we haven't officially "freed" those file system resources until
after we garbage collect the task/executor sandbox! (We keep the sandbox
around so a user/operator can get the stdout/stderr or anything else left
around from their task/executor.)To solve this problem we wanted to be able to let a task/executor terminate
but not give up all of it's resources, hence: persistent resources.Pushing this concept even further you could imagine always reallocating
resources to a framework that had already been allocated those resources
for a previous task/executor. Looked at from another perspective, these are
"late-binding", or "lazy", resource reservations.At one point in time we had considered just doing 'right-of-first-refusal'
for allocations after a task/executor terminate. But this is really
insufficient for supporting storage-like frameworks well (and likely even
harder to reliably implement then 'persistent resources' IMHO).There are a ton of things that need to get worked out in this model,
including (but not limited to), how should a file system (or disk) be
exposed in order to be made persistent? How should persistent resources be
returned to a master? How many persistent resources can a framework get
allocated?
Attachments
Issue Links
- blocks
-
MESOS-2727 0.23.0 Release
- Resolved
- is duplicated by
-
MESOS-1902 Support persistent disk resource.
- Resolved
- is related to
-
MESOS-191 Add support for multiple disk resources
- In Progress
-
MESOS-1777 Design persistent resources
- Resolved
- relates to
-
MESOS-2018 Dynamic Reservation
- Resolved
-
MESOS-1961 Ensure executor state is correctly reconciled between master and slave.
- Accepted
-
MESOS-1587 Report disk usage from MesosContainerizer
- Resolved
-
MESOS-1588 Enforce disk quota in MesosContainerizer
- Resolved
-
MESOS-2299 default work_dir of /tmp/mesos is problematic
- Resolved