Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-816

Allow delegation to shell scripts for isolation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • agent, containerization
    • None

    Description

      Being able to delegate isolation to shell scripts could make it easier to leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker and similar containerization systems.

      Why go through command line tools for isolation? We have seen many requests for isolation, covering a wide variety of scenarios:

      • Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 1.9).
      • Setups requiring installation and configuration of RPM-packaged applications.
      • Build-and-test setups, where sharing the environment of the host would impact reproducibility.
      • Integration of 3rd party, service-oriented applications.
      • Launching applications with Docker.
      • Launching multiple instances of a Mesos framework that, like Hadoop, has significant system setup and dependencies.

      To cover these and other use cases, it seems reasonable to allow Mesos to delegate to external programs for isolation:

      • It makes it easier to experiment with new containerization tools.
      • It allows for site administrators to customize containerization, or even implement new containerization mechanisms, without impacting their ability to keep pace with Mesos development.
      • Many external programs exist for containerization – Docker, LXC tools, LibVirt – which handle a great deal of the book-keeping around finding and efficiently cloning disk images and setting up the guest system (its hostname, TTYs, /dev/*, /proc).

      The scenarios listed above can be understood in terms of three use cases:

      • The containerized system service scenario, wherein an application, installed with RPM or a similar tool, is started and managed by the init system within a container. Percona MySQL is an example of such an application.
      • The containerized application scenario, wherein an application is installed or unpacked and then configured and launched in a single command. For example, running a custom Rails app with bundle install && bundle exec rails.
      • The containerized framework/executor scenario, wherein the application is Spark, Hadoop or another Mesos framework/executor pair.

      One way to achieve this could be to introduce an External Isolator, which works in parallel with the existing process/posix and cgroups isolators. The responsibility of this isolator would be to act as a thin layer to external isolators. Calls for task launching, stopping or any other resource change would be serialized and passed to the external isolators by the Mesos External Isolator.

      Allowing for pluggable isolators invites the possibility of having different isolators per task. For applications using containers, it's reasonable that each application or framework can specify a different base image; and this would be an option passed to the corresponding isolator. One can also imagine specialized frameworks that need to disable isolation entirely. For example, a "system backup" framework that would specify a null isolator to allow it to snapshot interesting data on each slave and transfer it to a sanctioned storage location.

      However, for users and framework authors to specify isolators would both be harmful to portability and would make isolation their problem, no longer something handled transparently by Mesos. Furthermore, it would have the unintended effect of putting them at odds with site administrators, who would also specify isolators – as a command line option for each slave.

      Allowing tasks to carry a more abstract notion of "container" with them would allow for most application level scenarios we've outlined above. Theoretically, more than one isolator might be able to handle a given container. For example if, the container is specified as an "ISO" and a distro LiveCD is provided, one could imagine a Docker isolator, LXC isolator or Virtualbox isolator handling it. Encouraging users and framework authors to specify a container would be simpler for them than specifying isolator flags, allows them to more clearly document their intent, and reduces the scope for conflict with other parties who have an interest in upgrading and tuning isolation. It also makes applications and command examples more portable, by decoupling the isolation mechanism from the desired container layout (which is, more or less, a chroot with some files in it).

      To this end, we propose adding an optional ContainerInfo to each CommandInfo:

      message CommandInfo {
      message ContainerInfo

      { required bytes image = 1; repeated bytes options = 2; }

      ...

      optional ContainerInfo container = 4;
      }

      The first field of the ContainerInfo should indicate the image, perhaps as a URL. For example:

      docker:///johncosta/redis
      iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
      lxc:///ubuntu

      The scheme of the URL – recognizable as a string of letters and digits and perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 – serves to indicate the type of the container, which isolators can use to determine both what to do with a container and how to obtain it. For the Docker URL type, for example, the absence of a host between the second and third slashes could be interpreted to mean that the image should be fetched from the Docker index or from a locally configured default Docker image server; whereas if a hostname is given, it is treated as the image server to use.

      The addition of "options" to the ContainerInfo poses a risk to portability and warrants both explanation and justification. In the case of Docker URLs, for example, it is possible to mount additional filesystems on the Docker command line; and these filesystems can even be indicated by reference to another Docker container by name. Support for this feature is clearly tied to the Docker URL and its meaning.

      When the default isolator for a slave is specified, there may also be a default container specified. It is good for us, then, that the ContainerInfo structure maps cleanly to an array of byte strings, since this is an easy thing to handle from the command line.

      Now in practice, how will we use the ContainerInfo? In the three use cases outlined above – service container, command container and containerized executor – tasks needing a special container will specify an ExecutorInfo in the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be part of the CommandInfo embedded in the ExecutorInfo.

      To consider a specific case, were the Storm framework packaged in a container, then the same container could be used both for Nimbus and the worker nodes:

      • Nimbus would be launched with a TaskInfo requesting the container and launching Nimbus.

      TaskInfo {
      executor = ExecutorInfo {
      command = CommandInfo {
      value = "python /opt/storm/bin/storm go"
      containerInfo = ContainerInfo

      { image = "docker:///storm-mesos/latest" options = [ "-p", "1337:8000" ] }

      }
      ...
      }
      ...
      }

      • Nimbus would launch executors with a TaskInfo requesting the very same container, but specifying a different command.

      TaskInfo {
      executor = ExecutorInfo {
      command = CommandInfo {
      value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o /opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor storm.mesos.MesosSupervisor"
      containerInfo = ContainerInfo

      { image = "docker:///storm-mesos/latest" }

      }
      ...
      }
      ...
      }

      While in the near term we expect container URLs to be pretty specific to the containerization mechanism, let us hope for a glorious future with URLs like `img:///ubuntu-13.04` that point to well-known, portable images.

      Attachments

        1. mesos-shell-isolator.jpg
          1.00 MB
          Jason Dusek

        Activity

          People

            Unassigned Unassigned
            solidsnack Jason Dusek
            Votes:
            7 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified