[SAMZA-1116] Yarn RM recovery causing duplicate containers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.11.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

To replicate:

Make sure that Yarn RM recovery is enabled
Deploy a test job
Terminate Yarn RM

Wait until AM of the job terminate with:

2017-02-02 13:08:04 RetryInvocationHandler [WARN] Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster over rm2. Not retrying because failovers (30) exceeded maximum allowed (30)

Restart RM

The job should get a new attempt but the old containers will not be terminated, causing duplicate containers to run.

Attachments

Issue Links

is related to

SAMZA-871 Implement heart-beat mechanism between JobCoordinator and all running containers

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Danil Serdyuchenko

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Mar/17 14:13

Updated:: 20/Dec/17 21:38