Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9148

Make cgroups destroy timeout configurable for Mesos containerizer

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Previously all containers from Mesos containerizer uses same 1 minute timeout for destroying cgroup. However, we have observed that for certain containers (possibly with deep system calls), the cgroup hierarchy was not destroyed within that timeout. The is quite problematic because containerizer short-circuits the destroy routine and skips isolator::cleanup. We have observed that GPU resources got leaked indefinitely due to such a bug (see MESOS-8038).

      The proposed workaround here is to add an optional agent flag to allow operator to override this timeout.

        Attachments

          Activity

            People

            • Assignee:
              zhitao Zhitao Li
              Reporter:
              zhitao Zhitao Li
              Shepherd:
              Gilbert Song
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: