Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8468

`LAUNCH_GROUP` failure tears down the default executor.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.2.0, 1.3.0, 1.4.0, 1.5.0
    • 1.5.1, 1.6.0
    • None

    Description

      The following code in the default executor (https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535) shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a fetcher failure), the whole executor will be shut down:

      // Check if we received a 200 OK response for all the
      // `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor
      // if this is not the case.
      foreach (const Response& response, responses.get()) {
        if (response.code != process::http::Status::OK) {
          LOG(ERROR) << "Received '" << response.status << "' ("
                     << response.body << ") while launching child container";
          _shutdown();
          return;
        }
      }
      

      This is not expected by a user. Instead, one would expect that a failed `LAUNCH_GROUP` won't affect other task groups launched by the same executor, similar to the case that a task failure only takes down its own task group. We should adjust the semantics to make a failed `LAUNCH_GROUP` not take down the executor and affect other task groups.

      Attachments

        Activity

          People

            gkleiman Gastón Kleiman
            chhsia0 Chun-Hung Hsiao
            Qian Zhang Qian Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: