Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8716

Freezer controller is not returned to thaw if task termination fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.3.2
    • None
    • agent, containerization
    • None

    Description

      This issue is related to https://issues.apache.org/jira/browse/MESOS-8004. A container may fail to terminate for a variety of reasons. One common reason in our system is when containers rely on external storage, they run fsync before exiting (fsync on SIGTERM). This makes it so that the termination can timeout. 

       

      Even though Mesos has sent the requisite kill signals, the task will never terminate because the cgroup stays frozen. 

       

      The intended behaviour should be that on failure to terminate, if the pids isolator is running, pids.max should be set to 0, to prevent further processes from being created, the cgroup should be walked and sigkilled, and then thawed. Once the processes finish thawing, the kill signal will be delivered, and processed, resulting in the container finally finishing,

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sargun Sargun Dhillon
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: