Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9283

Docker containerizer actor can get backlogged with large number of containers.

    XMLWordPrintableJSON

Details

    • Mesosphere RI-6 Sprint 2018-30, Mesosphere RI-6 Sprint 2018-31
    • 3

    Description

      We observed during some scale testing that we do internally.

      When launching 300+ Docker containers on a single agent box, it's possible that the Docker containerizer actor gets backlogged. As a result, API processing like `GET_CONTAINERS` will become unresponsive. It'll also block Mesos containerizer from launching containers if one specified `--containers=docker,mesos` because Docker containerizer launch will be invoked first by the composing containerizer (and queued).

      Profiling results show that the bottleneck is `os::killtree`, which will be invoked when the Docker commands are discarded (e.g., client disconnect, etc.).

      For this particular case, killtree is not really necessary because the docker command does not fork additional subprocesses. If we use the argv version of `subprocess` to launch docker commands, we can simply use os::kill instead. We confirmed that, by switching to os::kill, the performance issues goes away, and the agent can easily scale up to 300+ containers.

      Attachments

        Issue Links

          Activity

            People

              greggomann Greg Mann
              jieyu Jie Yu
              Jie Yu Jie Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: