Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-941

Memory limit not correctly set when no memory resource set on executor level

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.17.0
    • agent
    • None

    Description

      When a framework is launched with memory resource only set on the tasks, and non set on the executor level, the slave fails to apply the memory control needed to limit memory usage for the executor. The executor process can use more resident memory than specified in the tasks.

      Example framework: https://gist.github.com/lin-zhao/8544495. This framework was tested with Mesos 0.14.2 on Centos 6, kernel 3.10.11-1.el6.x86_64.

      According to Benjamin Mahler:

      What's happening is that you're launching an executor with no resources, consequently before we fork, we attempt to update the memory control but we don't call the memory handler since the executor has no memory resources:

      I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default (/home/lin/test-executor) in /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed with resources for framework 201401171812-2907575306-5050-19011-0020 in cgroup mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
      I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup controls for executor default of framework 201401171812-2907575306-5050-19011-0020 with resources
      I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening for OOM events for executor default of framework 201401171812-2907575306-5050-19011-0020
      I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at = 27609

      Then, later, when we are updating the resources for your 128MB task, we set the soft limit, but we don't set the hard limit because the following buggy check is not satisfied:

      // Determine whether to set the hard limit. If this is the first
      // time (info->pid.isNone()), or we're raising the existing limit,
      // then we can update the hard limit safely. Otherwise, if we need
      // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
      // much memory is in use. As a result, we only update the soft
      // limit when the memory reservation is being reduced. This is
      // probably okay if the machine has available resources.
      // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
      // discrepancy between usage and soft limit and introduces a
      // "manual oom" if necessary.
      if (info->pid.isNone() || limit > currentLimit.get()) {

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            linzhao Lin Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: