[YARN-957] Capacity Scheduler tries to reserve the memory more than what node manager reports. - ASF JIRA

XML

Word

Printable

JSON

I have 2 node managers.

one with 1024 MB memory.(nm1)
second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are
stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first).
now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory.
now start nm2 with 2048 MB memory.

It hangs forever... Ideally this has two potential issues.

It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that.
Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2.

blocks

YARN-880 Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution

YARN-713 ResourceManager can exit unexpectedly if DNS is unavailable

duplicates

YARN-1076 RM gets stuck with a reservation, ignoring new containers

is related to

YARN-394 RM should be able to return requests that it cannot fulfill

YARN-389 Infinitely assigning containers when the required resource exceeds the cluster's absolute capacity

YARN-1592 CapacityScheduler tries to reserve more than a node's total memory on branch-0.23

(1 is related to)