Details
-
Epic
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
Persistence
Description
This question came up in dev mailing list.
It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector.
At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. /var/lib/cassandra).
Benjamin Hindman suggested an idea in the mailing list (though it still needs some fleshing out):
The idea originally came about because, even today, if we allocate some
file system space to a task/executor, and then that task/executor
terminates, we haven't officially "freed" those file system resources until
after we garbage collect the task/executor sandbox! (We keep the sandbox
around so a user/operator can get the stdout/stderr or anything else left
around from their task/executor.)To solve this problem we wanted to be able to let a task/executor terminate
but not give up all of it's resources, hence: persistent resources.Pushing this concept even further you could imagine always reallocating
resources to a framework that had already been allocated those resources
for a previous task/executor. Looked at from another perspective, these are
"late-binding", or "lazy", resource reservations.At one point in time we had considered just doing 'right-of-first-refusal'
for allocations after a task/executor terminate. But this is really
insufficient for supporting storage-like frameworks well (and likely even
harder to reliably implement then 'persistent resources' IMHO).There are a ton of things that need to get worked out in this model,
including (but not limited to), how should a file system (or disk) be
exposed in order to be made persistent? How should persistent resources be
returned to a master? How many persistent resources can a framework get
allocated?
Attachments
Attachments
Issue Links
- blocks
-
MESOS-2727 0.23.0 Release
- Resolved
- is duplicated by
-
MESOS-1902 Support persistent disk resource.
- Resolved
- is related to
-
MESOS-191 Add support for multiple disk resources
- In Progress
-
MESOS-1777 Design persistent resources
- Resolved
- relates to
-
MESOS-2018 Dynamic Reservation
- Resolved
-
MESOS-1961 Ensure executor state is correctly reconciled between master and slave.
- Accepted
-
MESOS-1587 Report disk usage from MesosContainerizer
- Resolved
-
MESOS-1588 Enforce disk quota in MesosContainerizer
- Resolved
-
MESOS-2299 default work_dir of /tmp/mesos is problematic
- Resolved
Issues in epic
|
MESOS-3413 | Docker containerizer does not symlink persistent volumes into sandbox | Resolved | Timothy Chen | ||
|
MESOS-4281 | Correctly handle disk quota usage when volumes are bind mounted into the container. | Resolved | Artem Harutyunyan | ||
|
MESOS-2210 | Disallow special characters in role. | Resolved | haosdent | ||
|
MESOS-3987 | /create-volumes, /destroy-volumes should be permissive under a master without authentication. | Resolved | Unassigned | ||
|
MESOS-3064 | Add 'principal' field to 'Resource.DiskInfo.Persistence' | Resolved | Greg Mann | ||
|
MESOS-3065 | Add framework authorization for persistent volume | Resolved | Greg Mann | ||
|
MESOS-2408 | Slave should reclaim storage for destroyed persistent volumes. | Resolved | Neil Conway | ||
|
MESOS-2455 | Add operator endpoints to create/destroy persistent volumes. | Resolved | Neil Conway | ||
|
MESOS-2123 | Document changes in C++ Resources API in CHANGELOG. | Resolved | Jie Yu | ||
|
MESOS-2031 | Manage persistent directories on slave. | Resolved | Jie Yu | ||
|
MESOS-2100 | Implement master to slave protocol for persistent disk resources. | Resolved | Jie Yu | ||
|
MESOS-2305 | Refactor validators in Master. | Resolved | Jie Yu | ||
|
MESOS-2030 | Maintain persistent disk resources in master memory. | Resolved | Jie Yu | ||
|
MESOS-2099 | Support acquiring/releasing resources with DiskInfo in allocator. | Resolved | Benjamin Mahler | ||
|
MESOS-1974 | Refactor the C++ Resources abstraction for DiskInfo | Resolved | Jie Yu | ||
|
MESOS-2029 | Allow slave to checkpoint resources. | Resolved | Jie Yu | ||
|
MESOS-2097 | Update Resource protobuf with DiskInfo | Resolved | Jie Yu | ||
|
MESOS-1777 | Design persistent resources | Resolved | Jie Yu | ||
|
MESOS-2098 | Update task validation to be after task authorization. | Resolved | Jie Yu | ||
|
MESOS-2101 | Add the persistent resources release primitive to the framework API | Resolved | Jie Yu | ||
|
MESOS-2135 | Support DiskInfo in C++ Resources | Resolved | Jie Yu | ||
|
MESOS-2404 | Add an example framework to test persistent volumes. | Resolved | Jie Yu | ||
|
MESOS-2405 | Add user doc for using persistent volumes. | Resolved | Michael Park | ||
|
MESOS-2427 | Add Java binding for the acceptOffers API. | Resolved | Jie Yu | ||
|
MESOS-2428 | Add Python bindings for the acceptOffers API. | Resolved | Jie Yu | ||
|
MESOS-2434 | Add an example framework to test persistent volumes | Resolved | Unassigned | ||
|
MESOS-2603 | Permissions and ownership of persistent volumes are not set correctly. | Resolved | haosdent | ||
|
MESOS-2955 | Introduce acceptOffers scheduler driver API for performing operations on Offers | Resolved | Unassigned | ||
|
MESOS-3124 | Updating persistent volumes after slave restart is problematic. | Resolved | Jie Yu | ||
|
MESOS-3867 | Make `Resource.DiskInfo.Persistence.principal` a required field | Resolved | Greg Mann | ||
|
MESOS-3903 | Add authorization for '/create-volume' and '/destroy-volume' HTTP endpoints | Resolved | Greg Mann | ||
|
MESOS-4178 | Add persistent volume support to the Authorizer | Resolved | Greg Mann | ||
|
MESOS-4179 | Extend `Master` to authorize persistent volumes | Resolved | Greg Mann | ||
|
MESOS-4198 | Disk Resource Reservation is NOT Enforced for Persistent Volumes | Resolved | Artem Harutyunyan | ||
|
MESOS-4395 | Add persistent volume endpoint tests with no principal | Resolved | Greg Mann | ||
|
MESOS-4539 | Exclude paths in Posix disk isolator should be absolute paths. | Resolved | Jie Yu | ||
|
MESOS-4824 | "filesystem/linux" isolator does not unmount orphaned persistent volumes | Resolved | Joseph Wu |