[YARN-6029] CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by one thread and LeafQueue#assignContainers is releasing excessive reserved container by another thread - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0
Component/s: capacityscheduler
Labels:
None

Target Version/s:

2.8.0
Hadoop Flags:

Reviewed

Description

When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers is called and before notifying parent queue to release resource (should release a reserved container), then ResourceManager can deadlock. I found this problem on our testing environment for hadoop2.8.

Reproduce the deadlock in chronological order

1. Thread A (ResourceManager Event Processor) calls synchronized LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
2. Thread B (IPC Server handler) calls synchronized ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), iterates over children queue acls and is blocked when calling synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is hold by Thread A)
3. Thread A wants to inform the parent queue that a container is being completed and is blocked when invoking synchronized ParentQueue#internalReleaseResource method (the ParentQueue instance lock of queue root is hold by Thread B)

I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be removed to solve this problem, since this method appears to not affect fields of LeafQueue instance.

Attach patch with UT for review.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

deadlock.jstack
27/Dec/16 11:41
187 kB
Tao Yang
YARN-6029.001.patch
27/Dec/16 11:41
9 kB
Tao Yang
YARN-6029.002.patch
29/Dec/16 03:25
10 kB
Tao Yang
YARN-6029-branch-2.8.001.patch
30/Dec/16 02:53
11 kB
Tao Yang

Activity

People

Assignee:: Tao Yang

Reporter:: Tao Yang

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 27/Dec/16 11:38

Updated:: 25/Oct/19 20:26

Resolved:: 04/Jan/17 00:23