Thanks to Koji for the attached stack-trace...
-> markUnresponsiveTasks (locks the TaskTracker here)
-> removeTaskFromJob (waiting to lock the RunningJob object)
-> purgeJob (locks the RunningJob object)
-> TIP.cleanup (waiting to lock the TaskTracker)
Clear-case of ordering issues during synchronization... it's a corner-case since it depends on the child-vm getting unresponsive and the cleanup thread kicking in; which is why I'm marking this for 0.14.0 rather than 0.13.0 - what do others think about this?
Two possible solutions to break the deadlock cycle:
a) Make TaskTracker.purgeJob a synchronized method, thus it locks the TaskTracker before locking the RunningJob method.
b) Make the TaskTracker.tasks map a Collections.synchronizedMap, thus doing away with the need to lock the TaskTracker in TIP.cleanup
I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in synchronized methods... and hence is a less intrusive change.