Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.1
-
None
Description
yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
I submitted a YARN application (slider app) that keepContainers=true, attemptFailuresValidityInterval=300000.
it did work properly when AM was failed firstly.
all containers launched by previous AM were resynced with new AM (attempt2) without killing containers.
after 10 minutes, I thought AM failure count was reset by attemptFailuresValidityInterval (5 minutes).
but, all containers were killed when AM was failed secondly. (new AM attempt3 was launched properly)