[YARN-929] 2 MRAppMaster running parallely for same Job Id - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.0.5-alpha
Fix Version/s: None
Component/s: resourcemanager
Labels:
None

Description

Configuration :
yarn.resourcemanager.am.max-retries = 3

Scenario is
NodeManager is killed forcefully i.e using kill -9 NM_PID.
After Node expiry , RM killed all the container running in this NodeManager.
But , MRAppMaster JVM is still running.
RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At this point, there are 2 MRAppMaster is running parallely for same job Id

Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs which cause FileNotFoundException for 2nd attempt MRApp.

Attachments

Issue Links

duplicates

MAPREDUCE-5396 Application is "FAILED" when multiple appmaster attempts are spawned

Open

Activity

People

Assignee:: Unassigned

Reporter:: Rohith Sharma K S

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Jul/13 13:11

Updated:: 16/Jul/13 14:05

Resolved:: 16/Jul/13 14:05