Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.3.2
-
None
-
None
Description
This issue is related to https://issues.apache.org/jira/browse/MESOS-8004. A container may fail to terminate for a variety of reasons. One common reason in our system is when containers rely on external storage, they run fsync before exiting (fsync on SIGTERM). This makes it so that the termination can timeout.
Even though Mesos has sent the requisite kill signals, the task will never terminate because the cgroup stays frozen.
The intended behaviour should be that on failure to terminate, if the pids isolator is running, pids.max should be set to 0, to prevent further processes from being created, the cgroup should be walked and sigkilled, and then thawed. Once the processes finish thawing, the kill signal will be delivered, and processed, resulting in the container finally finishing,
Attachments
Issue Links
- is related to
-
MESOS-8004 Failed to kill all processes in the container due to cgroup freeze failure
- Resolved