Currently, tasks launched with the command executor have a hierarchy of processes inside their container that looks as follows:
However, the only pid from this hierarchy of processes that the agent is aware of is the the pid for the top-level mesos-containerizer launch binary.
If all of these binaries were part of the same set of namespaces, then this would be sufficient to discover the namespaces of the task process (we could simply inspect the namespaces of the mesos-containerizer launch pid and know they were the same for the task process.
This is true for most of the namespaces that each of these processes exist in. However, the mnt namespace of the two may differ. That is, the mesos-containerizer launch binary is always in the same mnt namespace as the host, while the task process binary may be in its own mnt namespace if file system isolation is turned on and it has a new rootfs provisioned for it (e.g. a docker image was provided for it).
This has not been a problem until now because we never wanted to simply enter the mnt namespace of a container before. Even with nested containers for pods, we always create a new mnt namespace branched off the host mnt namespace (in order to support the injection of host-mounted volumes).
However, with the new debugging support we are adding, we need a way of entering the mnt namespace of a parent container instead of cloning a new one.
Since we only have access to the pid of the container's init process, we can simply enter all namespaces associated with that pid except the mnt namespace. For the mnt namespace, we need to special case it to walk the process hierarchy until we find the first process in a different mnt namespace and enter that one instead. If none are found, simply enter the mnt namespace of the "init" process.
This is a dirty dirty hack, but should be sufficient for now.
Eventually we want to completely eliminate the command executor in favor of the "pod" (i.e. "default") executor, which doesn't have this problem at all.