[YARN-1373] Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: resourcemanager
Labels:
None

Description

Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (~~YARN-1365~~ and ~~YARN-1366~~). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt.

Attachments

Issue Links

is duplicated by

YARN-1210 During RM restart, RM should start a new attempt only when previous attempt exits for real

Closed

Activity

People

Assignee:: Omkar Vinit Joshi

Reporter:: Bikas Saha

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 30/Oct/13 07:21

Updated:: 27/Jun/14 19:34

Resolved:: 17/Jun/14 21:43