Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.2.0, 1.3.0, 1.4.0, 1.5.0
-
None
-
Mesosphere Sprint 73, Mesosphere Sprint 74
-
5
Description
The following code in the default executor (https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535) shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a fetcher failure), the whole executor will be shut down:
// Check if we received a 200 OK response for all the // `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor // if this is not the case. foreach (const Response& response, responses.get()) { if (response.code != process::http::Status::OK) { LOG(ERROR) << "Received '" << response.status << "' (" << response.body << ") while launching child container"; _shutdown(); return; } }
This is not expected by a user. Instead, one would expect that a failed `LAUNCH_GROUP` won't affect other task groups launched by the same executor, similar to the case that a task failure only takes down its own task group. We should adjust the semantics to make a failed `LAUNCH_GROUP` not take down the executor and affect other task groups.