[MESOS-9283] Docker containerizer actor can get backlogged with large number of containers. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Done
Affects Version/s: 1.4.2, 1.5.1, 1.6.1, 1.7.0
Fix Version/s: 1.4.3, 1.5.2, 1.6.2, 1.7.1, 1.8.0
Component/s: containerization
Labels:
- perfomance

Target Version/s:

1.4.3, 1.5.2, 1.6.2, 1.7.1, 1.8.0
Sprint:
Mesosphere RI-6 Sprint 2018-30, Mesosphere RI-6 Sprint 2018-31
Story Points:
3

Description

We observed during some scale testing that we do internally.

When launching 300+ Docker containers on a single agent box, it's possible that the Docker containerizer actor gets backlogged. As a result, API processing like `GET_CONTAINERS` will become unresponsive. It'll also block Mesos containerizer from launching containers if one specified `--containers=docker,mesos` because Docker containerizer launch will be invoked first by the composing containerizer (and queued).

Profiling results show that the bottleneck is `os::killtree`, which will be invoked when the Docker commands are discarded (e.g., client disconnect, etc.).

For this particular case, killtree is not really necessary because the docker command does not fork additional subprocesses. If we use the argv version of `subprocess` to launch docker commands, we can simply use os::kill instead. We confirmed that, by switching to os::kill, the performance issues goes away, and the agent can easily scale up to 300+ containers.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2018-10-01 at 10.54.18 PM.png
02/Oct/18 05:54
510 kB
Jie Yu

Issue Links

relates to

MESOS-9279 Docker Containerizer 'usage' call might be expensive if mount table is big.

Resolved

MESOS-9268 Hitting agent's `/containers` endpoint might backlog Docker containerizer process.

Resolved

Activity

People

Assignee:: Greg Mann

Reporter:: Jie Yu

Shepherd:: Jie Yu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Oct/18 05:51

Updated:: 17/Oct/18 18:33

Resolved:: 17/Oct/18 18:33