Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9836

Docker containerizer overwrites `/mesos/slave` cgroups.

    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Sprint:
      Containerization: RI-17 52, Containerization: RI-17 53
    • Story Points:
      5

      Description

      The following bug was observed on our internal testing cluster.

      The docker containerizer launched a container on an agent:

      I0523 06:00:53.888579 21815 docker.cpp:1195] Starting container 'f69c8a8c-eba4-4494-a305-0956a44a6ad2' for task 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1' (and executor 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1') of framework 415284b7-2967-407d-b66f-f445e93f064e-0011
      I0523 06:00:54.524171 21815 docker.cpp:783] Checkpointing pid 13716 to '/var/lib/mesos/slave/meta/slaves/60c42ab7-eb1a-4cec-b03d-ea06bff00c3f-S2/frameworks/415284b7-2967-407d-b66f-f445e93f064e-0011/executors/apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1/runs/f69c8a8c-eba4-4494-a305-0956a44a6ad2/pids/forked.pid'
      

      After the container was launched, the docker containerizer did a docker inspect on the container and cached the pid:
      https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1764
      The pid should be slightly greater than 13716.

      The docker executor sent a TASK_FINISHED status update around 16 minutes later:

      I0523 06:16:17.287595 21809 slave.cpp:5566] Handling status update TASK_FINISHED (Status UUID: 4e00b786-b773-46cd-8327-c7deb08f1de9) for task apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1 of framework 415284b7-2967-407d-b66f-f445e93f064e-0011 from executor(1)@172.31.1.7:36244
      

      After receiving the terminal status update, the agent asked the docker containerizer to update cpu.cfs_period_uscpu.cfs_quota_us and memory.soft_limit_in_bytes of the container through the cached pid:
      https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1696

      I0523 06:16:17.290447 21815 docker.cpp:1868] Updated 'cpu.shares' to 102 at /sys/fs/cgroup/cpu,cpuacct/mesos/slave for container f69c8a8c-eba4-4494-a305-0956a44a6ad2
      I0523 06:16:17.290660 21815 docker.cpp:1895] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container f69c8a8c-eba4-4494-a305-0956a44a6ad2
      I0523 06:16:17.889816 21815 docker.cpp:1937] Updated 'memory.soft_limit_in_bytes' to 32MB for container f69c8a8c-eba4-4494-a305-0956a44a6ad2
      

      Note that the cgroup of cpu.shares was /mesos/slave. This was possibly because that over the 16 minutes the pid got reused:

      # zgrep 'systemd.cpp:98\]' /var/log/mesos/archive/mesos-agent.log.12.gz
      ...
      I0523 06:00:54.525178 21815 systemd.cpp:98] Assigned child process '13716' to 'mesos_executors.slice'
      I0523 06:00:55.078546 21808 systemd.cpp:98] Assigned child process '13798' to 'mesos_executors.slice'
      I0523 06:00:55.134096 21808 systemd.cpp:98] Assigned child process '13799' to 'mesos_executors.slice'
      ...
      I0523 06:06:30.997439 21808 systemd.cpp:98] Assigned child process '32689' to 'mesos_executors.slice'
      I0523 06:06:31.050976 21808 systemd.cpp:98] Assigned child process '32690' to 'mesos_executors.slice'
      I0523 06:06:31.110514 21815 systemd.cpp:98] Assigned child process '32692' to 'mesos_executors.slice'
      I0523 06:06:33.143726 21818 systemd.cpp:98] Assigned child process '446' to 'mesos_executors.slice'
      I0523 06:06:33.196251 21818 systemd.cpp:98] Assigned child process '447' to 'mesos_executors.slice'
      I0523 06:06:33.266332 21816 systemd.cpp:98] Assigned child process '449' to 'mesos_executors.slice'
      ...
      I0523 06:09:34.870056 21808 systemd.cpp:98] Assigned child process '13717' to 'mesos_executors.slice'
      I0523 06:09:34.937762 21813 systemd.cpp:98] Assigned child process '13744' to 'mesos_executors.slice'
      I0523 06:09:35.073971 21817 systemd.cpp:98] Assigned child process '13754' to 'mesos_executors.slice'
      ...
      

      It was highly likely that the container itself exited around 06:09:35, way before the docker executor detected and reported the terminal status update, and then its pid was reused by another forked child of the agent, and thus cpu.cfs_period_uscpu.quota_us and memory.soft_limit_in_bytes of the /mesos/slave cgroup was mistakenly overwritten.

        Attachments

          Activity

            People

            • Assignee:
              qianzhang Qian Zhang
              Reporter:
              chhsia0 Chun-Hung Hsiao
              Shepherd:
              Gilbert Song
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: