[YARN-6153] keepContainer does not work when AM retry window is set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.1
Fix Version/s: 2.9.0, 3.0.0-alpha4
Component/s: resourcemanager
Labels:
None

Target Version/s:

2.9.0, 3.0.0-alpha4

Description

yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
I submitted a YARN application (slider app) that keepContainers=true, attemptFailuresValidityInterval=300000.

it did work properly when AM was failed firstly.
all containers launched by previous AM were resynced with new AM (attempt2) without killing containers.

after 10 minutes, I thought AM failure count was reset by attemptFailuresValidityInterval (5 minutes).
but, all containers were killed when AM was failed secondly. (new AM attempt3 was launched properly)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-6153-branch-2.8.patch
03/Mar/17 06:32
24 kB
kyungwan nam
YARN-6153.006.patch
28/Feb/17 07:27
25 kB
kyungwan nam
YARN-6153.005.patch
24/Feb/17 08:17
23 kB
kyungwan nam
YARN-6153.004.patch
23/Feb/17 03:22
21 kB
kyungwan nam
YARN-6153.003.patch
21/Feb/17 09:55
10 kB
kyungwan nam
YARN-6153.002.patch
16/Feb/17 09:59
10 kB
kyungwan nam
YARN-6153.001.patch
08/Feb/17 02:21
2 kB
kyungwan nam

Activity

People

Assignee:: kyungwan nam

Reporter:: kyungwan nam

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Feb/17 02:04

Updated:: 21/Apr/17 07:08

Resolved:: 21/Apr/17 07:08