Affects Version/s: None
Task successfully finishes and sends TASK_FINISHED status update.
Task successfully finishes, but the agent sends TASK_FAILED with the reason "REASON_EXECUTOR_TERMINATED".
However, if the processing of the initial TASK_FINISHED gets delayed, then there is a chance that Docker executor terminates and the agent triggers TASK_FAILED which will be handled prior to the TASK_FINISHED status update.
See attached logs which contain an example of the race condition.
1. Add the following code:
2. Recompile mesos
3. Launch mesos master and agent locally
4. Launch a simple Docker task via `mesos-execute`:
1. Mesos agent receives TASK_FINISHED status update and then subscribes on `containerizer->status()`.
2. `containerizer->status()` operation for TASK_FINISHED status update gets delayed in the composing containerizer (e.g. due to switch of the worker thread that executes `status` method).
3. Docker executor terminates and the agent triggers TASK_FAILED.
4. Docker containerizer destroys the container. A registered callback for the `containerizer->wait` call in the composing containerizer dispatches lambda function that will clean up `containers_` map.
5. Composing c'zer resumes and dispatches `status()` method to the Docker containerizer for TASK_FINISHED, which in turn hangs for a few seconds.
6. Corresponding `containerId` gets removed from the `containers_` map of the composing c'zer.
7. Mesos agent subscribes on `containerizer->status()` for the TASK_FAILED status update.
8. Composing c'zer returns "Container not found" for TASK_FAILED.
9. `Slave::_statusUpdate` stores TASK_FAILED terminal status update in the executor's data structure.
10. Docker containerizer resumes and finishes processing of `status()` method for TASK_FINISHED. Finally, it returns control to the `Slave::_statusUpdate` continuation. This method discovers that the executor has already been destroyed.