Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9116

Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.

    XMLWordPrintableJSON

Details

    • Mesosphere Sprint 2018-26, Mesosphere Sprint 2018-27
    • 8

    Description

      Launch nested container call might fail with the following error:

      Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such file or directory
      

      This happens when the containerizer launcher tries to enter `mnt` namespace using the pid of a terminated process. The pid was detected by the agent before spawning the containerizer launcher process, because the process was running back then.

      The issue can be reproduced using the following test (pseudocode):

      launchTask("sleep 1000")
      
      parentContainerId = containerizer.containers().begin()
      
      outputs = []
      for i in range(10):
        ContainerId containerId
        containerId.parent = parentContainerId
        containerId.id = UUID.random()
      
        LAUNCH_NESTED_CONTAINER_SESSION(containerId, "echo echo")
        response = ATTACH_CONTAINER_OUTPUT(containerId)
        outputs.append(response.reader)
      
      for output in outputs:
        stdout, stderr = getProcessIOData(output)
        assert("echo" == stdout + stderr)

      When we start the very first nested container, `getMountNamespaceTarget()` returns a PID of the task (`sleep 1000`), because it's the only process whose `mnt` namespace differs from the parent container. This nested container becomes a child of PID 1 process, which is also a parent of the command executor. It's not an executor's child! It can be seen in attached `pstree.png`.

      When we start a second nested container, `getMountNamespaceTarget()` might return PID of the previous nested container (`echo echo`) instead of the task's PID (`sleep 1000`). It happens because the first nested container entered `mnt` namespace of the task. Then, the containerizer launcher ("nanny" process) attempts to enter `mnt` namespace using the PID of a terminated process, so we get this error.

      Attachments

        1. pstree.png
          46 kB
          Andrei Budnik

        Activity

          People

            abudnik Andrei Budnik
            abudnik Andrei Budnik
            Alex R Alex R
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: