Affects Version/s: None
Fix Version/s: None
Today if AMs go down,
- RM kills all the containers of that ApplicationAttempt
- New ApplicationAttempt doesn't know where the previous containers are running
- Old running containers don't know where the new AM is running.
We need to fix this to enable work-preserving AM restart. The later two potentially can be done at the app level, but it is good to have a common solution for all apps where-ever possible.
|1.||Add restart support for Unmanaged AMs||Open||Unassigned|
|2.||Revisit how AMs learn of containers from previous attempts||Open||Unassigned|
|3.||Enable discovery of AMs by containers||Open|