Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10015

updateAllocation() can stall the allocator with a huge number of reservations on an agent.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5.3, 1.6.2, 1.7.2, 1.8.1, 1.9.0
    • Fix Version/s: 1.7.3, 1.8.2, 1.9.1, 1.10.0
    • Component/s: None
    • Target Version/s:
    • Sprint:
      Resource Mgmt: RI-20 58
    • Story Points:
      5

      Description

      Currently, updateAllocation() called for a single-object Resources for a single framework on a single slave requires `(total number of frameworks) * (number of resource objects per this slave)^2` calls of `Resource::addable()`

      In a cluster with a large number of frameworks this results in severe degradation of allocator performance when a bunch of RESERVE/UNRESERVE operations occurs for an agent with hundreds of unique resources.

      On our testing cluster task we observed task scheduling delays up to 30 minutes due to allocator being occupied with processing UNRESERVE operations.

        Attachments

        1. out.svg
          49 kB
          Benjamin Mahler

          Issue Links

            Activity

              People

              • Assignee:
                asekretenko Andrei Sekretenko
                Reporter:
                asekretenko Andrei Sekretenko
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: