[YARN-1408] Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.5.0
Component/s: resourcemanager
Labels:
None

Target Version/s:

2.5.0

Description

Capacity preemption is enabled as follows.

yarn.resourcemanager.scheduler.monitor.enable= true ,
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

Queue = a,b
Capacity of Queue A = 80%
Capacity of Queue B = 20%

Step 1: Assign a big jobA on queue a which uses full cluster capacity
Step 2: Submitted a jobB to queue b which would use less than 20% of cluster capacity

JobA task which uses queue b capcity is been preempted and killed.

This caused below problem:
1. New Container has got allocated for jobA in Queue A as per node update from an NM.
2. This container has been preempted immediately as per preemption.

Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat reached RM.
ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ACQUIRED at KILLED

This also caused the Task to go for a timeout for 30minutes as this Container was already killed by preemption.
attempt_1380289782418_0003_m_000000_0 Timed out after 1800 secs

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-1408-branch-2.5-1.patch
15/Jul/14 23:47
38 kB
Mayank Bansal
Yarn-1408.patch
13/Nov/13 06:14
2 kB
Sunil G
Yarn-1408.9.patch
11/Jul/14 15:46
36 kB
Sunil G
Yarn-1408.8.patch
10/Jul/14 17:14
35 kB
Sunil G
Yarn-1408.7.patch
04/Jul/14 02:13
36 kB
Sunil G
Yarn-1408.6.patch
30/Jun/14 13:07
30 kB
Sunil G
Yarn-1408.5.patch
27/Jun/14 16:57
27 kB
Sunil G
Yarn-1408.4.patch
14/Feb/14 11:40
7 kB
Sunil G
Yarn-1408.3.patch
14/Feb/14 10:12
6 kB
Sunil G
Yarn-1408.2.patch
31/Dec/13 05:09
1 kB
Sunil G
Yarn-1408.11.patch
15/Jul/14 08:09
36 kB
Sunil G
Yarn-1408.10.patch
12/Jul/14 09:39
36 kB
Sunil G
Yarn-1408.1.patch
13/Nov/13 10:26
2 kB
Sunil G

Issue Links

is blocked by

YARN-2143 Merge common killContainer logic of Fair/Capacity scheduler into AbstractYarnScheduler

Open

Activity

People

Assignee:: Sunil G

Reporter:: Sunil G

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 13/Nov/13 06:11

Updated:: 15/Aug/14 05:44

Resolved:: 15/Jul/14 23:47