Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Today if AMs go down,
- RM kills all the containers of that ApplicationAttempt
- New ApplicationAttempt doesn't know where the previous containers are running
- Old running containers don't know where the new AM is running.
We need to fix this to enable work-preserving AM restart. The later two potentially can be done at the app level, but it is good to have a common solution for all apps where-ever possible.
Attachments
Attachments
Issue Links
- incorporates
-
YARN-4758 Enable discovery of AMs by containers
- Open
- is depended upon by
-
YARN-896 Roll up for long-lived services in YARN
- Open
- is related to
-
MAPREDUCE-6608 Work Preserving AM Restart for MapReduce
- Open
- relates to
-
TWILL-256 Support fault tolerant AM restart
- Open