Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.5-alpha
-
None
-
None
Description
Configuration :
yarn.resourcemanager.am.max-retries = 3
Scenario is
NodeManager is killed forcefully i.e using kill -9 NM_PID.
After Node expiry , RM killed all the container running in this NodeManager.
But , MRAppMaster JVM is still running.
RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At this point, there are 2 MRAppMaster is running parallely for same job Id
Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs which cause FileNotFoundException for 2nd attempt MRApp.
Attachments
Issue Links
- duplicates
-
MAPREDUCE-5396 Application is "FAILED" when multiple appmaster attempts are spawned
- Open