This was reported by tan experimenting with health checks. Many tasks were launched with the following health check, taken from the container stdout/stderr:
This should have led to all tasks getting killed due to --consecutive_failures being set, however, only some tasks get killed, while other remain running.
It turns out that the health check binary does a send and promptly exits. Unfortunately, this may lead to a message drop since libprocess may not have sent this message over the socket by the time the process exits.
We work around this in the command executor with a manual sleep, which has been around since the svn days. See here.