Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9283

Docker containerizer actor can get backlogged with large number of containers.

    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Sprint:
      Mesosphere RI-6 Sprint 2018-30, Mesosphere RI-6 Sprint 2018-31
    • Story Points:
      3

      Description

      We observed during some scale testing that we do internally.

      When launching 300+ Docker containers on a single agent box, it's possible that the Docker containerizer actor gets backlogged. As a result, API processing like `GET_CONTAINERS` will become unresponsive. It'll also block Mesos containerizer from launching containers if one specified `--containers=docker,mesos` because Docker containerizer launch will be invoked first by the composing containerizer (and queued).

      Profiling results show that the bottleneck is `os::killtree`, which will be invoked when the Docker commands are discarded (e.g., client disconnect, etc.).

      For this particular case, killtree is not really necessary because the docker command does not fork additional subprocesses. If we use the argv version of `subprocess` to launch docker commands, we can simply use os::kill instead. We confirmed that, by switching to os::kill, the performance issues goes away, and the agent can easily scale up to 300+ containers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                greggomann Greg Mann
                Reporter:
                jieyu Jie Yu
                Shepherd:
                Jie Yu
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: