Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1790

Aurora CNI Support

    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Epic Name:
      CNI

      Description

      The Container Network Interface (CNI) is a plug-in based networking solution for containers. CNI is supported by the Mesos Unified Containerizer.

      CNI support in Aurora would enable cluster operators to isolate tasks on the network level. This includes features such as IP-per container, or security policies ensuring that only designated subsets of containers can communicate with each other. Both are important feature for multi-tenant environments.

      Mesos Protobufs

      In order to launch a task using CNI, Mesos requires frameworks to populate the NetworkInfo protobuf. The following shows the relevant fields:

      /**
       * Describes a container configuration and allows extensible
       * configurations for different container implementations.
       *
       * NOTE: In the Aurora case, this is set as part of ExecutorInfo
       */
      message ContainerInfo {
        ...
        // A list of network requests. A framework can request multiple IP addresses
        // for the container.
        repeated NetworkInfo network_infos = 7;
        ...
      }
      
      /**
       * Describes a network request from a framework as well as network resolution
       * provided by Mesos.
       */
      message NetworkInfo {
        ...
        // For the CNI case, empty during task/executor launch and only used
        // in TaskStatus messages to inform the framework scheduler about
        // the IP addresses bound to a container
        repeated IPAddress ip_addresses = 5;
      
        // Name of the network which will be used by network isolator to determine
        // the network that the container joins. It's up to the network isolator
        // to decide how to interpret this field.
        optional string name = 6;
      
        // To tag certain metadata to be used by Isolator/IPAM, e.g., rack, etc.
        // Opaque to Mesos but interpreted by the CNI plugin
        optional Labels labels = 4;
        ...
      }
      
      /**
       * Container related information that is resolved during container
       * setup. The information is sent back to the framework as part of the
       * TaskStatus message.
       */
      message ContainerStatus {
        // This field can be reliably used to identify the container IP address.
        repeated NetworkInfo network_infos = 1;
        ...
      }
      

      Challenges

      • In contrast to ports or other resources, this is the first time an important detail is only discovered asynchronously after a task has been launched, i.e. the scheduler will only learn about the IP addresses of the launched task after having received its first TaskStatus.
      • A task can now live in multiple networks and can have multiple IP addresses.

      Necessary Changes

      In order to implement CNI support in Aurora, several changes across the entire code base are needed.

      Mesos

      • As of today, it seems like there is no reliable way to discover CNI-assigned IPs from within an executor (see MESOS-6281). This is crucial for us, as Thermos is responsible to announce itself into Zookeeper serversets.

      Thermos

      • The Observer UI needs to be updated to handle multiple IP addresses.
      • The ZK serverset announcement needs to be adjusted to publish all IP-addresses.
      • A replacement/addition for pystachio {{mesos.hostname}} is required so that usercode can discover its current IP addresses. This relates to MESOS-6281.

      Aurora Scheduler

      • Feature toggle allowing operators to enabe/disable CNI support.
      • Plumbing of NetworkInfo name and labels touching Thrift API, storage, and task launch mechanism.
      • Extension of TaskStatusHandlerImpl and StateManager storage layer to persist received IP addresses.

      Aurora Client

      • Extension of the Pystachio configuration so that user-defined jobs can join operator enabled networks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                StephanErb Stephan Erb
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: