Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9116

Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.

    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Sprint:
      Mesosphere Sprint 2018-26, Mesosphere Sprint 2018-27
    • Story Points:
      8

      Description

      Launch nested container call might fail with the following error:

      Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such file or directory
      

      This happens when the containerizer launcher tries to enter `mnt` namespace using the pid of a terminated process. The pid was detected by the agent before spawning the containerizer launcher process, because the process was running back then.

      The issue can be reproduced using the following test (pseudocode):

      launchTask("sleep 1000")
      
      parentContainerId = containerizer.containers().begin()
      
      outputs = []
      for i in range(10):
        ContainerId containerId
        containerId.parent = parentContainerId
        containerId.id = UUID.random()
      
        LAUNCH_NESTED_CONTAINER_SESSION(containerId, "echo echo")
        response = ATTACH_CONTAINER_OUTPUT(containerId)
        outputs.append(response.reader)
      
      for output in outputs:
        stdout, stderr = getProcessIOData(output)
        assert("echo" == stdout + stderr)

      When we start the very first nested container, `getMountNamespaceTarget()` returns a PID of the task (`sleep 1000`), because it's the only process whose `mnt` namespace differs from the parent container. This nested container becomes a child of PID 1 process, which is also a parent of the command executor. It's not an executor's child! It can be seen in attached `pstree.png`.

      When we start a second nested container, `getMountNamespaceTarget()` might return PID of the previous nested container (`echo echo`) instead of the task's PID (`sleep 1000`). It happens because the first nested container entered `mnt` namespace of the task. Then, the containerizer launcher ("nanny" process) attempts to enter `mnt` namespace using the PID of a terminated process, so we get this error.

        Attachments

        1. pstree.png
          46 kB
          Andrei Budnik

          Activity

            People

            • Assignee:
              abudnik Andrei Budnik
              Reporter:
              abudnik Andrei Budnik
              Shepherd:
              Alex R
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: