[MESOS-1554] Persistent resources support for storage-like services - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Epic
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: fetcher
Labels:
- mesosphere
- twitter

Epic Name:
Persistence

Description

This question came up in dev mailing list.
It seems reasonable for storage like services (e.g. HDFS or Cassandra) to use Mesos to manage it's instances. But right now if we'd like to restart instance (e.g. to spin up a new version) - all previous instance version sandbox filesystem resources will be recycled by slave's garbage collector.

At the moment filesystem resources can be managed out of band - i.e. instances can save their data in some database specific placed, that various instances can share (e.g. /var/lib/cassandra).

Benjamin Hindman suggested an idea in the mailing list (though it still needs some fleshing out):

The idea originally came about because, even today, if we allocate some
file system space to a task/executor, and then that task/executor
terminates, we haven't officially "freed" those file system resources until
after we garbage collect the task/executor sandbox! (We keep the sandbox
around so a user/operator can get the stdout/stderr or anything else left
around from their task/executor.)

To solve this problem we wanted to be able to let a task/executor terminate
but not give up all of it's resources, hence: persistent resources.

Pushing this concept even further you could imagine always reallocating
resources to a framework that had already been allocated those resources
for a previous task/executor. Looked at from another perspective, these are
"late-binding", or "lazy", resource reservations.

At one point in time we had considered just doing 'right-of-first-refusal'
for allocations after a task/executor terminate. But this is really
insufficient for supporting storage-like frameworks well (and likely even
harder to reliably implement then 'persistent resources' IMHO).

There are a ton of things that need to get worked out in this model,
including (but not limited to), how should a file system (or disk) be
exposed in order to be made persistent? How should persistent resources be
returned to a master? How many persistent resources can a framework get
allocated?

Attachments

Issue Links

Add Link

blocks

MESOS-2727 0.23.0 Release

Resolved

Delete this link

is duplicated by

MESOS-1902 Support persistent disk resource.

Resolved

Delete this link

is related to

MESOS-191 Add support for multiple disk resources

In Progress

Delete this link

MESOS-1777 Design persistent resources

Resolved

Delete this link

relates to

MESOS-2018 Dynamic Reservation

Resolved

Delete this link

MESOS-1961 Ensure executor state is correctly reconciled between master and slave.

Accepted

Delete this link

MESOS-1587 Report disk usage from MesosContainerizer

Resolved

Delete this link

MESOS-1588 Enforce disk quota in MesosContainerizer

Resolved

Delete this link

MESOS-2299 default work_dir of /tmp/mesos is problematic

Resolved

Delete this link

(4 relates to)

Issues in epic

quick-create-issue-for-epic-label

MESOS-3413	Docker containerizer does not symlink persistent volumes into sandbox	Resolved	Timothy Chen	Actions
MESOS-4281	Correctly handle disk quota usage when volumes are bind mounted into the container.	Resolved	Artem Harutyunyan	Actions
MESOS-2210	Disallow special characters in role.	Resolved	haosdent	Actions
MESOS-3987	/create-volumes, /destroy-volumes should be permissive under a master without authentication.	Resolved	Unassigned	Actions
MESOS-3064	Add 'principal' field to 'Resource.DiskInfo.Persistence'	Resolved	Greg Mann	Actions
MESOS-3065	Add framework authorization for persistent volume	Resolved	Greg Mann	Actions
MESOS-2408	Slave should reclaim storage for destroyed persistent volumes.	Resolved	Neil Conway	Actions
MESOS-2455	Add operator endpoints to create/destroy persistent volumes.	Resolved	Neil Conway	Actions
MESOS-2123	Document changes in C++ Resources API in CHANGELOG.	Resolved	Jie Yu	Actions
MESOS-2031	Manage persistent directories on slave.	Resolved	Jie Yu	Actions
MESOS-2100	Implement master to slave protocol for persistent disk resources.	Resolved	Jie Yu	Actions
MESOS-2305	Refactor validators in Master.	Resolved	Jie Yu	Actions
MESOS-2030	Maintain persistent disk resources in master memory.	Resolved	Jie Yu	Actions
MESOS-2099	Support acquiring/releasing resources with DiskInfo in allocator.	Resolved	Benjamin Mahler	Actions
MESOS-1974	Refactor the C++ Resources abstraction for DiskInfo	Resolved	Jie Yu	Actions
MESOS-2029	Allow slave to checkpoint resources.	Resolved	Jie Yu	Actions
MESOS-2097	Update Resource protobuf with DiskInfo	Resolved	Jie Yu	Actions
MESOS-1777	Design persistent resources	Resolved	Jie Yu	Actions
MESOS-2098	Update task validation to be after task authorization.	Resolved	Jie Yu	Actions
MESOS-2101	Add the persistent resources release primitive to the framework API	Resolved	Jie Yu	Actions
MESOS-2135	Support DiskInfo in C++ Resources	Resolved	Jie Yu	Actions
MESOS-2404	Add an example framework to test persistent volumes.	Resolved	Jie Yu	Actions
MESOS-2405	Add user doc for using persistent volumes.	Resolved	Michael Park	Actions
MESOS-2427	Add Java binding for the acceptOffers API.	Resolved	Jie Yu	Actions
MESOS-2428	Add Python bindings for the acceptOffers API.	Resolved	Jie Yu	Actions
MESOS-2434	Add an example framework to test persistent volumes	Resolved	Unassigned	Actions
MESOS-2603	Permissions and ownership of persistent volumes are not set correctly.	Resolved	haosdent	Actions
MESOS-2955	Introduce acceptOffers scheduler driver API for performing operations on Offers	Resolved	Unassigned	Actions
MESOS-3124	Updating persistent volumes after slave restart is problematic.	Resolved	Jie Yu	Actions
MESOS-3867	Make `Resource.DiskInfo.Persistence.principal` a required field	Resolved	Greg Mann	Actions
MESOS-3903	Add authorization for '/create-volume' and '/destroy-volume' HTTP endpoints	Resolved	Greg Mann	Actions
MESOS-4178	Add persistent volume support to the Authorizer	Resolved	Greg Mann	Actions
MESOS-4179	Extend `Master` to authorize persistent volumes	Resolved	Greg Mann	Actions
MESOS-4198	Disk Resource Reservation is NOT Enforced for Persistent Volumes	Resolved	Artem Harutyunyan	Actions
MESOS-4395	Add persistent volume endpoint tests with no principal	Resolved	Greg Mann	Actions
MESOS-4539	Exclude paths in Posix disk isolator should be absolute paths.	Resolved	Jie Yu	Actions
MESOS-4824	"filesystem/linux" isolator does not unmount orphaned persistent volumes	Resolved	Joseph Wu	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Michael Park

Reporter:: Nikita Vetoshkin

Votes:: 37 Vote for this issue

Watchers:: 83 Start watching this issue

Dates

Created:: 29/Jun/14 17:06

Updated:: 02/May/17 23:05

Resolved:: 10/Mar/17 20:31

Agile

View on Board

Persistent resources support for storage-like services

Details

Description

Attachments

Attachments

Issue Links

Issues in epic

Activity

People

Dates

Agile

Slack

Issue deployment