Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8480

Mesos returns high resource usage when killing a Docker task.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.2, 1.5.0, 1.6.0
    • containerization
    • None
    • Mesosphere Sprint 73
    • 2

    Description

      The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through /proc/<pid>/cgroup first (taking the cpuacct subsystem as an example):

      9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
      

      Then read /sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat to get the statistics:

      user 4
      system 0
      

      However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making /proc/<pid>/docker look like the following:

      9:cpuacct,cpu:/
      

      This makes a racy call to cgroup::internal::cgroup() return a single '/', which in turn makes DockerContainerizerProcess::cgroupsStatistics() read /sys/fs/cgroup/cpuacct///cpuacct.stat, which contains the statistics for the root cgroup:

      user 228058750
      system 24506461
      

      This can be reproduced by test.cpp with the following command:

      $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
      ...
      
      Reading file '/proc/44224/cgroup'
      Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
      user 4
      system 0
      
      Reading file '/proc/44224/cgroup'
      Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
      user 228058750
      system 24506461
      
      Reading file '/proc/44224/cgroup'
      Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
      user 228058750
      system 24506461
      
      Failed to open file '/proc/44224/cgroup'
      sleep
      [2]-  Exit 1                  ./test $(docker inspect sleep | jq .[].State.Pid)
      

      Attachments

        1. test.cpp
          2 kB
          Chun-Hung Hsiao

        Activity

          People

            chhsia0 Chun-Hung Hsiao
            chhsia0 Chun-Hung Hsiao
            Jie Yu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: