[HADOOP-1461] Corner-case deadlock in TaskTracker - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.12.3
Fix Version/s: 0.14.0
Component/s: None
Labels:
None

Description

Thanks to Koji for the attached stack-trace...

Summary:

main()
-> offerService()
-> markUnresponsiveTasks (locks the TaskTracker here)
-> purgeTask()
-> removeTaskFromJob (waiting to lock the RunningJob object)

taskCleanup
-> purgeJob (locks the RunningJob object)
-> TIP.jobHasFinished()
-> TIP.cleanup (waiting to lock the TaskTracker)

Clear-case of ordering issues during synchronization... it's a corner-case since it depends on the child-vm getting unresponsive and the cleanup thread kicking in; which is why I'm marking this for 0.14.0 rather than 0.13.0 - what do others think about this?

Two possible solutions to break the deadlock cycle:

a) Make TaskTracker.purgeJob a synchronized method, thus it locks the TaskTracker before locking the RunningJob method.
b) Make the TaskTracker.tasks map a Collections.synchronizedMap, thus doing away with the need to lock the TaskTracker in TIP.cleanup

I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in synchronized methods... and hence is a less intrusive change.

Thoughts?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-1461_1_20070605.patch
05/Jun/07 11:20
0.6 kB
Arun Murthy
main_taskcleanup_deadlock.txt
05/Jun/07 10:35
2 kB
Arun Murthy

Activity

People

Assignee:: Arun Murthy

Reporter:: Arun Murthy

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 05/Jun/07 10:34

Updated:: 08/Jul/09 16:52

Resolved:: 06/Jun/07 05:45