Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
Mesosphere Sprint 2018-26, Mesosphere Sprint 2018-27, Mesosphere Sprint 2018-28
-
8
Description
When containers are launching on a single agent at scale, one container stuck at ISOLATING could occasionally happen. And this container becomes un-destroyable due to containerizer destroy always wait for isolate() finish to continue.
We add more logging to debug this issue:
Aug 10 17:23:28 ip-10-0-1-129.us-west-2.compute.internal mesos-agent[2974]: I0810 17:23:28.050068 2995 collect.hpp:271] $$$$: AwaitProcess waited invoked for ProcessBase ID: __await__(26651); futures size: 3; future: Ready; future index: 2; ready count: 1 Aug 10 17:23:28 ip-10-0-1-129.us-west-2.compute.internal mesos-agent[2974]: I0810 17:23:28.414436 2998 collect.hpp:271] $$$$: AwaitProcess waited invoked for ProcessBase ID: __await__(26651); futures size: 3; future: Ready; future index: 0; ready count: 2
which shows that the await() in CNI::attach() stuck at the second future (io::read() for stdout).
By looking at the df of this stdout:
Aug 10 17:23:27 ip-10-0-1-129.us-west-2.compute.internal mesos-agent[2974]: I0810 17:23:27.657501 2995 cni.cpp:1287] !!!!: Start to await for plugin '/opt/mesosphere/active/mesos/libexec/mesos/mesos-cni-port-mapper' to finish for container 1c8abf4c-f71a-4704-9a73-1ab0dd709c62 with pid '16644'; stdout fd: 1781; stderr fd: 1800
We found
core@ip-10-0-1-129 ~ $ ps aux | grep mesos-agent core 1674 0.0 0.0 6704 864 pts/0 S+ 20:00 0:00 grep --colour=auto mesos-agent root 2974 16.4 2.5 1211096 414048 ? Ssl 17:02 29:11 /opt/mesosphere/packages/mesos--61265af3be37861f26b657c1f9800293b86a0374/bin/mesos-agent core@ip-10-0-1-129 ~ $ sudo ls -al /proc/2974/fd/ | grep 1781 l-wx------. 1 root root 64 Aug 10 19:38 1781 -> /var/lib/mesos/slave/meta/slaves/d3089315-8e34-40b4-b1f7-0ac6a624d7db-S0/frameworks/d3089315-8e34-40b4-b1f7-0ac6a624d7db-0000/executors/test2.d820326d-9cc1-11e8-9809-ee15da5c8980/runs/38e9270d-ebda-4758-ad96-40c5b84bffdc/tasks/test2.d820326d-9cc1-11e8-9809-ee15da5c8980/task.updates
core@ip-10-0-1-129 ~ $ ps aux | grep 27981 core 2201 0.0 0.0 6704 884 pts/0 S+ 20:06 0:00 grep --colour=auto 27981 root 27981 0.0 0.0 1516 4 ? Ss 17:25 0:00 sleep 10000 core@ip-10-0-1-129 ~ $ cat /proc/s^C core@ip-10-0-1-129 ~ $ sudo -s ip-10-0-1-129 core # ls -al /proc/27981/fd | grep 275230 lr-x------. 1 root root 64 Aug 10 20:05 1781 -> pipe:[275230] l-wx------. 1 root root 64 Aug 10 20:05 1787 -> pipe:[275230]
core@ip-10-0-1-129 ~ $ sudo ls -al /proc/2974/fd/ | grep pipe lr-x------. 1 root root 64 Aug 10 17:02 11 -> pipe:[49380] l-wx------. 1 root root 64 Aug 10 17:02 14 -> pipe:[49380] lr-x------. 1 root root 64 Aug 10 17:02 17 -> pipe:[48909] lr-x------. 1 root root 64 Aug 10 19:38 1708 -> pipe:[275089] l-wx------. 1 root root 64 Aug 10 19:38 1755 -> pipe:[275089] lr-x------. 1 root root 64 Aug 10 19:38 1787 -> pipe:[275230] l-wx------. 1 root root 64 Aug 10 17:02 19 -> pipe:[48909]
pipe 275230 is held by the agent process and the sleep process at the same time!
The reason why the leak is possible is because we don't use `pipe2` to create a pipe with `O_CLOEXEC` in subprocess:
https://github.com/apache/mesos/blob/1.5.x/3rdparty/libprocess/src/subprocess_posix.cpp#L61
Although we do set cloexec on those fds later:
https://github.com/apache/mesos/blob/1.5.x/3rdparty/libprocess/src/subprocess.cpp#L366-L373
There is a race where a fork happens after `pipe()` call, but before cloexec is called later. This is more likely on a busy system (this explains why it's not hard to repo the issue when launching a lot of containers on a single box).
Attachments
Issue Links
- depends upon
-
MESOS-8128 Make os::pipe file descriptors O_CLOEXEC.
- Resolved
-
MESOS-9152 Close all file descriptors except whitelist_fds in posix/subprocess.
- Resolved
- is duplicated by
-
MESOS-9162 Unkillable pod container stuck in ISOLATING
- Resolved