Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.5.1
-
Mesosphere Sprint 2018-27, Mesosphere Sprint 2018-28, Mesosphere Sprint 2018-29
-
6
Description
A container might get stuck in DESTROYING state if there's a command health check that starts new nested containers while its parent container is getting destroyed.
Here are some logs which unrelated lines removed. The `REMOVE_NESTED_CONTAINER`/`LAUNCH_NESTED_CONTAINER_SESSION` keeps looping afterwards.
2018-04-16 12:37:54: I0416 12:37:54.235877 3863 containerizer.cpp:2807] Container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 has exited 2018-04-16 12:37:54: I0416 12:37:54.235914 3863 containerizer.cpp:2354] Destroying container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 in RUNNING state 2018-04-16 12:37:54: I0416 12:37:54.235932 3863 containerizer.cpp:2968] Transitioning the state of container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 from RUNNING to DESTROYING 2018-04-16 12:37:54: I0416 12:37:54.236100 3852 linux_launcher.cpp:514] Asked to destroy container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.e6e01854-40a0-4da3-b458-2b4cf52bbc11 2018-04-16 12:37:54: I0416 12:37:54.237671 3852 linux_launcher.cpp:560] Using freezer to destroy cgroup mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11 2018-04-16 12:37:54: I0416 12:37:54.240327 3852 cgroups.cpp:3060] Freezing cgroup /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11 2018-04-16 12:37:54: I0416 12:37:54.244179 3852 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11 after 3.814144ms 2018-04-16 12:37:54: I0416 12:37:54.250550 3853 cgroups.cpp:3078] Thawing cgroup /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11 2018-04-16 12:37:54: I0416 12:37:54.256599 3853 cgroups.cpp:1444] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11 after 5.977856ms ... 2018-04-16 12:37:54: I0416 12:37:54.371117 3837 http.cpp:3502] Processing LAUNCH_NESTED_CONTAINER_SESSION call for container 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd' 2018-04-16 12:37:54: W0416 12:37:54.371692 3842 http.cpp:2758] Failed to launch container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd: Parent container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 is in 'DESTROYING' state 2018-04-16 12:37:54: W0416 12:37:54.371826 3840 containerizer.cpp:2337] Attempted to destroy unknown container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd ... 2018-04-16 12:37:55: I0416 12:37:55.504456 3856 http.cpp:3078] Processing REMOVE_NESTED_CONTAINER call for container 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-f3a1238c-7f0f-4db3-bda4-c0ea951d46b6' ... 2018-04-16 12:37:55: I0416 12:37:55.556367 3857 http.cpp:3502] Processing LAUNCH_NESTED_CONTAINER_SESSION call for container 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211' ... 2018-04-16 12:37:55: W0416 12:37:55.582137 3850 http.cpp:2758] Failed to launch container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211: Parent container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 is in 'DESTROYING' state ... 2018-04-16 12:37:55: W0416 12:37:55.583330 3844 containerizer.cpp:2337] Attempted to destroy unknown container db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211 ...
This stops when the framework reconciles and instructs Mesos to kill the task. Which also results in a
2018-04-16 13:06:04: I0416 13:06:04.161623 3843 http.cpp:2966] Processing KILL_NESTED_CONTAINER call for container 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133'
Nothing else related to this container is logged following this line.
Attachments
Issue Links
- relates to
-
MESOS-8568 Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
- Resolved