Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.11.0
-
None
-
None
-
None
Description
To replicate:
- Make sure that Yarn RM recovery is enabled
- Deploy a test job
- Terminate Yarn RM
- Wait until AM of the job terminate with:
2017-02-02 13:08:04 RetryInvocationHandler [WARN] Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster over rm2. Not retrying because failovers (30) exceeded maximum allowed (30)
- Restart RM
The job should get a new attempt but the old containers will not be terminated, causing duplicate containers to run.
Attachments
Issue Links
- is related to
-
SAMZA-871 Implement heart-beat mechanism between JobCoordinator and all running containers
- Resolved