[YARN-972] Allow requests and scheduling for fractional virtual cores - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.0.5-alpha
Fix Version/s: None
Component/s: api, scheduler
Labels:
None

Description

As this idea sparked a fair amount of discussion on ~~YARN-2~~, I'd like to go deeper into the reasoning.

Currently the virtual core abstraction hides two orthogonal goals. The first is that a cluster might have heterogeneous hardware and that the processing power of different makes of cores can vary wildly. The second is that a different (combinations of) workloads can require different levels of granularity. E.g. one admin might want every task on their cluster to use at least a core, while another might want applications to be able to request quarters of cores. The former would configure a single vcore per core. The latter would configure four vcores per core.

I don't think that the abstraction is a good way of handling the second goal. Having a virtual cores refer to different magnitudes of processing power on different clusters will make the difficult problem of deciding how many cores to request for a job even more confusing.

Can we not handle this with dynamic oversubscription?
Dynamic oversubscription, i.e. adjusting the number of cores offered by a machine based on measured CPU-consumption, should work as a complement to fine-granularity scheduling. Dynamic oversubscription is never going to be perfect, as the amount of CPU a process consumes can vary widely over its lifetime. A task that first loads a bunch of data over the network and then performs complex computations on it will suffer if additional CPU-heavy tasks are scheduled on the same node because its initial CPU-utilization was low. To guard against this, we will need to be conservative with how we dynamically oversubscribe. If a user wants to explicitly hint to the scheduler that their task will not use much CPU, the scheduler should be able to take this into account.

On ~~YARN-2~~, there are concerns that including floating point arithmetic in the scheduler will slow it down. I question this assumption, and it is perhaps worth debating, but I think we can sidestep the issue by multiplying CPU-quantities inside the scheduler by a decently sized number like 1000 and keep doing the computations on integers.

The relevant APIs are marked as evolving, so there's no need for the change to delay 2.1.0-beta.

Attachments

Issue Links

duplicates

YARN-6449 Enable YARN to accept jobs with < 1 core allocations

Resolved

is related to

YARN-1024 Define a CPU resource(s) unambigiously

Open

Activity

People

Assignee:: Sandy Ryza

Reporter:: Sandy Ryza

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 24/Jul/13 22:32

Updated:: 07/Apr/17 06:01

Resolved:: 22/Aug/13 01:42