[YARN-6349] Container kill request from AM can be lost if container is still recovering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: nodemanager
Labels:
None

Description

If container recovery takes an excessive amount of time (e.g.: HDFS is slow) then the NM could start servicing requests before all containers have recovered. If an AM tries to kill a container while it is still recovering then this kill request could be lost.

Attachments

Issue Links

relates to

YARN-4051 ContainerKillEvent lost when container is still recovering and application finishes

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/Mar/17 14:40

Updated:: 16/Mar/17 14:41