Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10015

updateAllocation() can stall the allocator with a huge number of reservations on an agent.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.5.3, 1.6.2, 1.7.2, 1.8.1, 1.9.0
    • 1.7.3, 1.8.2, 1.9.1, 1.10.0
    • None
    • Resource Mgmt: RI-20 58
    • 5

    Description

      Currently, updateAllocation() called for a single-object Resources for a single framework on a single slave requires `(total number of frameworks) * (number of resource objects per this slave)^2` calls of `Resource::addable()`

      In a cluster with a large number of frameworks this results in severe degradation of allocator performance when a bunch of RESERVE/UNRESERVE operations occurs for an agent with hundreds of unique resources.

      On our testing cluster task we observed task scheduling delays up to 30 minutes due to allocator being occupied with processing UNRESERVE operations.

      Attachments

        1. out.svg
          49 kB
          Benjamin Mahler

        Issue Links

          Activity

            People

              asekretenko Andrei Sekretenko
              asekretenko Andrei Sekretenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: