Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
There are some scenarios where AM will not get containers and indefinitely waiting. We faced one such sceanrio which makes the applications to get hung :
Consider a cluster setup which has 2 NMS of each 8GB resource,
And 2 applications(MR2) are launched in the default queue where in each AM is taking 2 GB each.
Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource .
As in each NM only 6GB resource is available both the applications are hung forever.
To avoid such scenarios i would like to propose
generic timeout feature for all AM's in yarn, such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt.
Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application