Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Using the process::loop() together with the common pattern of using libprocess (Process wrapper + dispatching) is prone to causing a deadlock on libprocess termination if the code does not wait for the loop exit before termination.
The deadlock itself is not directly caused by the process::loop(), though.
It occurs in a following setup with two processes (let's name them A and B).
Thread 1 tries to cleanup process A. It locks processes_mutex and hangs here:
https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3079
waiting for the process A to have no strong references.
Thread 2 begins with creating a ProcessReference in ProcessManager::deliver(UPID&) called for process: https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L2799
and ends up waiting for processes_mutex in ProcessManager::terminate() for process B:
https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3155
-----------------
In the observed case, terminate() for process B was triggered by a destructor of a process-wrapping object owned by a libprocess loop executing on A.
I'm attaching the stacks captured at the deadlock. Stacks of the threads which lock one another are in deadlock_stacks_filtered.txt Note frame #1 in Thread 5 (waiting for all references to expire) and frames #48 and #8 in Thread 19 (creating a reference and waiting for a processes_mutex).
Attachments
Attachments
Issue Links
- blocks
-
MESOS-7258 Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.
- Resolved