Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
See overview doc at YARN-4692, copying the sub-section to track all related efforts.
Though it is desirable to not special-case services anywhere in YARN, there are a few key areas where such special recognition is not only unavoidable but very necessary. For example, preemption and reservation of long running containers have different implications from regular ones.
Preemption of resources in YARN today works by killing of containers. Obviously, preempting long running containers is different and costlier for the apps. For many long-lived applications, preemption by killing will likely be not even an option that they can tolerate.
[Task] Preemption also means that scheduler should avoid allocating long running containers on borrowed resources.
On the other hand, today’s scheduler creates reservations when they cannot fit a container on a machine with free resources. When making such reservations for containers of long running services, the scheduler shouldn’t queue the reservation behind other services running on a node otherwise the reservation may get stuck unfulfilled forever.
[Task] Preemption and reservations logic thus need to understand if an application has long running containers and make decisions accordingly.
[Task] There is an existing JIRA YARN-1039 (Add parameter for YARN resource requests to indicate "long lived") which was filed to address some of this special recognition of service containers. The options were between a boolean flag and a long representing the lifecycle, though in practice I think we will need to have both.
Attachments
Issue Links
- is part of
-
YARN-4692 [Umbrella] Simplified and first-class support for services in YARN
- Reopened