Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9148

Make cgroups destroy timeout configurable for Mesos containerizer

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.7.0
    • None
    • None

    Description

      Previously all containers from Mesos containerizer uses same 1 minute timeout for destroying cgroup. However, we have observed that for certain containers (possibly with deep system calls), the cgroup hierarchy was not destroyed within that timeout. The is quite problematic because containerizer short-circuits the destroy routine and skips isolator::cleanup. We have observed that GPU resources got leaked indefinitely due to such a bug (see MESOS-8038).

      The proposed workaround here is to add an optional agent flag to allow operator to override this timeout.

      Attachments

        Activity

          People

            zhitao Zhitao Li
            zhitao Zhitao Li
            Gilbert Song Gilbert Song
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: