Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Implement the solution described in MESOS-3352 in the LinuxLauncher
In order to avoid the migration of cgroup pids by Systemd we can use the delegate=true flag. This guards Systemd from migrating the pids that are descendants of the process launched by a Systemd unit.
In order for this strategy to work, the delegate flag must be supported by the Systemd version. Support for this was introduced in Systemd v218; however, it has also been backported to v208 for RHEL7 and CentOS7 here with the package systemd-208-20. It is highly recommended to upgrade to this package if running those operating systems.
Once the delegate=true flag has been set, the cgroups that are manually manipulated by the agent will no longer be migrated during the lifetime of the agent.
This still leaves the problem of tasks being migrated after the agent has stopped running (voluntarily or not). In order to deal with the problem we propose the following solution:
If an agent is running on a Systemd initialized machine, then the agent will create a Systemd slice with a life-time that is independent of the agent and delegate=true. The linux launcher (used when cgroups isolators are enabled) will then assign the cgroup name for any executor that is launched to this separate slice. The consequence of this is that when the agent unit is terminated, the separate slice will continue to delegate the cgroups preventing Systemd from migrating the pids. A side benefit of this is that we can maintain the KillMode=control-group flag on the agent and terminate all agent specific services such as the fetcher without terminating the tasks. This provides for a nice clean-up.
This solution will still require that the agent unit be launched with the delegate=true flag such that there is no race during the transition of the pids from the agent to the separate slice.
The agent will be responsible for verifying the slice is still available upon recovery, and warning the operator if it notices that the tasks it is recovering are no longer associated with this separate slice, as this can cause silent loss of isolation of existing tasks.
Attachments
Issue Links
- is related to
-
MESOS-1113 Refactor cgroup interface in preparation for Systemd NWO.
- Accepted