[MESOS-8125] Agent should properly handle recovering an executor when its pid is reused - ASF JIRA

XML

Word

Printable

JSON

Here's how to reproduce this issue:

Start a task using the Docker containerizer (the same will probably happen with the command executor).
Stop the corresponding Mesos agent while the task is running.
Change the executor's checkpointed forked pid, which is located in the meta directory, e.g., /var/lib/mesos/slave/meta/slaves/latest/frameworks/19faf6e0-3917-48ab-8b8e-97ec4f9ed41e-0001/executors/foo.13faee90-b5f0-11e7-8032-e607d2b4348c/runs/latest/pids/forked.pid. I used pid 2, which is normally used by kthreadd.
Reboot the host

is related to

MESOS-6223 Allow agents to re-register post a host reboot

relates to

MESOS-9501 Mesos executor fails to terminate and gets stuck after agent host reboot.

MESOS-9672 Docker containerizer should ignore pids of executors that do not pass the connection check.