Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8313

Provide a host namespace container supervisor.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • containerization
    • None

    Description

      After more investigation on user namespaces, the current implementation of creating the container namespaces needs some adjustment before we can implement user namespaces in a useable fashion.

      The problems we need to address are:

      1. The containerizer needs to hold CAP_SYS_ADMIN over the PID namespace to mount procfs. Currently, this prevents containers joining the host PID namespace. The workaround is to always create a new container PID namespace (as a child of the user namespace) with the namespaces/pid isolator.

      2. The containerizer needs to hold CAP_SYS_ADMIN over the network namespace to mount sysfs. There's no general workaround for this since we can't generally require containers to not join the host network namespace.

      3. The containerizer can't enter a user namespace after entering the chroot. This restriction makes the existing order of containerizer operations impossible to remain in the case where we want the executor to be in a new user namespace that has no children (i.e. to protect the container from a privileged task).

      After some discussion with jieyu, we believe that we can some most or all of these issues by creating a new containerized supervisor that runs fully outside the container and is responsible for constructing the roots mount namespace, launching the containerized to enter the rest of the container, and waiting on the entered process.

      Since this new supervisor process is not running in the user namespace, it will be able to construct the container rootfs in a new mount namespace without user namespace restrictions. We can then clone a child to fully create and enter container namespaces along with the prefabricated rootfs mount namespace.

      The only drawback to this approach is that the container's mount namespace will be owned by the root user namespace rather than the container user namespace. We are OK with this for now.

      The plan here is to retain the existing mesos-containerizer launch subcommand and add a new mesos-containerizer supervise subcommand, which will be its parent process. This new subcommand will be used for the default executor and custom executor code paths.

      Attachments

        1. IMG_2629.JPG
          1.14 MB
          James Peach

        Issue Links

          Activity

            People

              jamespeach James Peach
              jamespeach James Peach
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: