I am trying to hunt down a weird issue where sometimes restarting a Mesos agent takes down all Mesos containers. The containers die without an apparent cause:
I0821 13:35:01.486346 61392 linux_launcher.cpp:360] Recovered container 02da7be0-271e-449f-9554-dc776adb29a9
I0821 13:35:03.627367 61362 provisioner.cpp:451] Recovered container 02da7be0-271e-449f-9554-dc776adb29a9
I0821 13:35:03.701448 61375 containerizer.cpp:2835] Container 02da7be0-271e-449f-9554-dc776adb29a9 has exited
I0821 13:35:03.701453 61375 containerizer.cpp:2382] Destroying container 02da7be0-271e-449f-9554-dc776adb29a9 in RUNNING state
I0821 13:35:03.701457 61375 containerizer.cpp:2996] Transitioning the state of container 02da7be0-271e-449f-9554-dc776adb29a9 from RUNNING to DESTROYING
From the perspective of the executor, there is nothing relevant in the logs. Everything just stops directly as if the container gets terminated externally without notifying the executor first. For further details, please see the attached agent log and one (example) executor log file.
I am aware that this is a long shot, but anyone an idea what I should be looking at to narrow down the issue?