[MESOS-7858] Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.1, 1.3.0
Fix Version/s: 1.2.3, 1.3.2, 1.4.0
Component/s: containerization
Labels:
- health-check
- mesosphere

Sprint:
Mesosphere Sprint 61, Mesosphere Sprint 62
Story Points:
5

Description

This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process of a fork to assert incorrectly, if the parent enters a new pid namespace before forking:
https://sourceware.org/bugzilla/show_bug.cgi?id=15392
https://sourceware.org/bugzilla/show_bug.cgi?id=21386

The LinuxLauncher code happens to do this when launching nested containers:

The MesosContainerizer process launches a subprocess, with a customized ns::clone function as an argument. The thread then basically waits for the launch to succeed and return a child PID: https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
A separate thread in the Mesos agent forks and then waits for the grandchild to report a PID: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
The child of the fork first enters the namespaces (including a pid namespace) and then forks a grandchild. The child then calls waitpid on the grandchild: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
Due to the glibc bug, the grandchild sometimes never returns from the fork here: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540

According to the glibc bug, we can work around this by:

The obvious solution is just to use clone() after setns() and never use fork() - and one can certainly patch both programs to do so. Nevertheless it would be nice to see if fork() also worked after setns(), especially since there is no inherent reason for it not to.

Attachments

Issue Links

relates to

MESOS-6656 Nested containers can become unkillable

Resolved

Activity

People

Assignee:: Jie Yu

Reporter:: Joseph Wu

Shepherd:: Benjamin Mahler

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Aug/17 02:39

Updated:: 24/Aug/17 04:26

Resolved:: 24/Aug/17 04:26