Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
On app master failure, the streaming containers should continue running.
As of 2.2, YARN will automatically terminate all containers and the replacement app master will relaunch them. Once we move to a newer minimum Hadoop version, we should leverage work preserving restart.
The mechanism in Apex containers to locate the new master process are already in place.
Test Cases:
1. Kill the app-master - only app-master container id should change, all the other containers id should remain same.
2. Kill the app-master and few other containers, make sure that killed containers are recovered.
Attachments
Issue Links
- is related to
-
YARN-1490 RM should optionally not kill all containers when an ApplicationMaster exits
- Closed
- links to