Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2047

Isolator cleanup failures shouldn't cause TASK_LOST.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.1
    • Component/s: None
    • Labels:
      None

      Description

      Right now, if isolator cleanup fails, we'll transition all pending tasks to TASK_LOST (even in the OOM case, we should have transitioned it to TASK_FAILED).

      The problematic code is here:

      1052 void MesosContainerizerProcess::___destroy(                                                         
      1053     const ContainerID& containerId,
      1054     const Future<Option<int>>& status,
      1055     const Future<list<Future<Nothing>>>& cleanups)
      1056 {   
      1057   // This should not occur because we only use the Future<list> to                                  
      1058   // facilitate chaining.
      1059   CHECK_READY(cleanups);
      1060                  
      1061   // Check cleanup succeeded for all isolators. If not, we'll fail the                              
      1062   // container termination and remove the 'destroying' flag but leave                               
      1063   // all other state. The container is now in an inconsistent state.
      1064   foreach (const Future<Nothing>& cleanup, cleanups.get()) {                                        
      1065     if (!cleanup.isReady()) {                                                                       
      1066       promises[containerId]->fail(                                                                  
      1067         "Failed to clean up an isolator when destroying container '" +                              
      1068         stringify(containerId) + "' :" +
      1069         (cleanup.isFailed() ? cleanup.failure() : "discarded future"));                             
      1070   
      1071       destroying.erase(containerId);
      1072                  
      1073       return;    
      1074     }
      1075   }
      

      Since launcher->destroy already succeeds (all processes are killed), instead of failing the promises[containerId], we probably should just export the error through metrics (so that people can get alerted on that) and still set the termination appropriately.

        Activity

        Hide
        bmahler Benjamin Mahler added a comment -

        For the short term, should we change the following known race to not be an error?

        port_mapping.cpp
          Try<Nothing> unmount = fs::unmount(target, MNT_DETACH);
          if (unmount.isError()) {
            errors.push_back("Failed to umount: " + unmount.error());
          }
        
          // MNT_DETACH does a lazy umount, which means umount will eventually
          // succeed when the mount point becomes idle, but possiblely not
          // soon enough every time for this remove to go through, e.g,
          // someone entered into the container for debugging purpose. In that
          // case remove will fail, which is okay, because we only leaked an
          // empty file, which could also be reused later if the pid (the name
          // of the file) is used again. However, we still return error to
          // indicate that the cleanup hasn't been successful.
          Try<Nothing> rm = os::rm(target);
          if (rm.isError()) {
            errors.push_back("Failed to remove " + target + ": " + rm.error());
          }
        

        Seems like this isn't something that should surface a TASK_LOST.

        Show
        bmahler Benjamin Mahler added a comment - For the short term, should we change the following known race to not be an error? port_mapping.cpp Try<Nothing> unmount = fs::unmount(target, MNT_DETACH); if (unmount.isError()) { errors.push_back( "Failed to umount: " + unmount.error()); } // MNT_DETACH does a lazy umount, which means umount will eventually // succeed when the mount point becomes idle, but possiblely not // soon enough every time for this remove to go through, e.g, // someone entered into the container for debugging purpose. In that // case remove will fail, which is okay, because we only leaked an // empty file, which could also be reused later if the pid (the name // of the file) is used again. However, we still return error to // indicate that the cleanup hasn't been successful. Try<Nothing> rm = os::rm(target); if (rm.isError()) { errors.push_back( "Failed to remove " + target + ": " + rm.error()); } Seems like this isn't something that should surface a TASK_LOST.
        Hide
        bmahler Benjamin Mahler added a comment -

        Ian Downes Looks like we have the same code in the pid isolator, but it's not an error in this case...?

        Future<Nothing> NamespacesPidIsolatorProcess::cleanup(
            const ContainerID& containerId)
        {
          const string target = nsExtraReference(containerId);
        
          if (os::exists(target)) {
            // We don't expect anyone to have a reference to target but do a
            // lazy umount in case. We do not want to force the umount; it
            // will not cause an issue if this umount is delayed.
            Try<Nothing> unmount = fs::unmount(target, MNT_DETACH);
        
            // This will fail if the unmount hasn't completed yet but this
            // only leaks a uniquely named empty file that will cleaned up as
            // an orphan on recovery.
            os::rm(target);
          }
        
          return Nothing();
        }
        
        Show
        bmahler Benjamin Mahler added a comment - Ian Downes Looks like we have the same code in the pid isolator, but it's not an error in this case...? Future<Nothing> NamespacesPidIsolatorProcess::cleanup( const ContainerID& containerId) { const string target = nsExtraReference(containerId); if (os::exists(target)) { // We don't expect anyone to have a reference to target but do a // lazy umount in case . We do not want to force the umount; it // will not cause an issue if this umount is delayed. Try<Nothing> unmount = fs::unmount(target, MNT_DETACH); // This will fail if the unmount hasn't completed yet but this // only leaks a uniquely named empty file that will cleaned up as // an orphan on recovery. os::rm(target); } return Nothing(); }
        Hide
        idownes Ian Downes added a comment -

        Yes, it doesn't make sense to do a lazy unmount and then fail if you cannot immediately remove the mount target.

        The pid isolator code just leaves the empty file around if it cannot immediately remove it, i.e., it's not considered a cleanup failure. I suggest the port_mapping code act in the same manner.

        Show
        idownes Ian Downes added a comment - Yes, it doesn't make sense to do a lazy unmount and then fail if you cannot immediately remove the mount target. The pid isolator code just leaves the empty file around if it cannot immediately remove it, i.e., it's not considered a cleanup failure. I suggest the port_mapping code act in the same manner.
        Hide
        jieyu Jie Yu added a comment -

        Ian Downes Do you agree with the points I made in the description? Yes, we can get away from this particular issue by removing the error checks for os::rm. However, we never know if there are some other errors in the future (for other isolators) that we can safely ignore.

        Show
        jieyu Jie Yu added a comment - Ian Downes Do you agree with the points I made in the description? Yes, we can get away from this particular issue by removing the error checks for os::rm. However, we never know if there are some other errors in the future (for other isolators) that we can safely ignore.
        Hide
        idownes Ian Downes added a comment -

        I do agree that it's incorrect to return TASK_LOST if all processes have been terminated but we failed to clean up other state, i.e., from the framework's perspective the task completed (whether it be FAILED or whatever).

        The short term fix is to avoid failing on the os::rm but longer term we should do as you suggest, perhaps even partitioning to ::terminate and ::cleanup so the slave has visibility into the stages.

        Show
        idownes Ian Downes added a comment - I do agree that it's incorrect to return TASK_LOST if all processes have been terminated but we failed to clean up other state, i.e., from the framework's perspective the task completed (whether it be FAILED or whatever). The short term fix is to avoid failing on the os::rm but longer term we should do as you suggest, perhaps even partitioning to ::terminate and ::cleanup so the slave has visibility into the stages.
        Hide
        jieyu Jie Yu added a comment -
        Show
        jieyu Jie Yu added a comment - Short-term fix: https://reviews.apache.org/r/28339
        Hide
        jieyu Jie Yu added a comment -

        commit 5d3093a39970a5cd0b907de13d96c8dc046ce3e3
        Author: Jie Yu <yujie.jay@gmail.com>
        Date: Fri Nov 21 10:35:23 2014 -0800

        Do not return error if removing bind mount fails.

        Review: https://reviews.apache.org/r/28339

        Show
        jieyu Jie Yu added a comment - commit 5d3093a39970a5cd0b907de13d96c8dc046ce3e3 Author: Jie Yu <yujie.jay@gmail.com> Date: Fri Nov 21 10:35:23 2014 -0800 Do not return error if removing bind mount fails. Review: https://reviews.apache.org/r/28339
        Hide
        jieyu Jie Yu added a comment -

        Close this for now as short term fix has been pushed.

        Show
        jieyu Jie Yu added a comment - Close this for now as short term fix has been pushed.

          People

          • Assignee:
            Unassigned
            Reporter:
            jieyu Jie Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development