This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process of a fork to assert incorrectly, if the parent enters a new pid namespace before forking:
The LinuxLauncher code happens to do this when launching nested containers:
- The MesosContainerizer process launches a subprocess, with a customized ns::clone function as an argument. The thread then basically waits for the launch to succeed and return a child PID: https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
- A separate thread in the Mesos agent forks and then waits for the grandchild to report a PID: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
- The child of the fork first enters the namespaces (including a pid namespace) and then forks a grandchild. The child then calls waitpid on the grandchild: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
- Due to the glibc bug, the grandchild sometimes never returns from the fork here: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540
According to the glibc bug, we can work around this by:
The obvious solution is just to use clone() after setns() and never use fork() - and one can certainly patch both programs to do so. Nevertheless it would be nice to see if fork() also worked after setns(), especially since there is no inherent reason for it not to.