Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9806

Address allocator performance regression due to the addition of quota limits.

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.9.0
    • allocation
    • Resource Mgmt: RI-17 Sprint 53
    • 5

    Description

      In MESOS-9802, we removed the quota role sorter which is tech debt.

      However, this slows down the allocator. The problem is that in the first stage, even though a cluster might have no active roles with non-default quota, the allocator will now have to sort and go through each and every role in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, the allocator could experience ~50% performance degradation.

      There are a couple of ways to address this issue. For example, we could make the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return all the roles with non-default quota. Alternatively, an even better approach would be to deprecate the sorter concept and just have two standalone functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree structure (not yet exist in the allocator) and return the sorted roles.

      In addition, when implementing MESOS-8068, we need to do more during the allocation cycle. In particular, we need to call shrink many more times than before. These all contribute to the performance slowdown. Specifically, for the quota oriented benchmark `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe 2-3x slowdown compared to the previous release (1.8.1):

      Current master:

      QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2
      Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
      Made 3500 allocations in 32.051382735secs
      Made 0 allocation in 27.976022773secs

      1.8.1:
      HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2
      Made 3500 allocations in 13.810811063secs
      Made 0 allocation in 9.885972984secs

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mzhu Meng Zhu
            mzhu Meng Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Resource Mgmt: RI-17 Sprint 53 ended 28/Aug/19
                View on Board

                Slack

                  Issue deployment