Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5377

Improve DRF behavior with scarce resources.

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Epic
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • allocation
    • None
    • Scarce Resources

    Description

      The allocator currently uses the notion of Weighted Dominant Resource Fairness (WDRF) to establish a linear notion of fairness across allocation roles.

      DRF behaves well for resources that are present within each machine in a cluster (e.g. CPUs, memory, disk). However, some resources (e.g. GPUs) are only present on a subset of machines in the cluster.

      Consider the behavior when there are the following agents in a cluster:

      1000 agents with (cpus:4,mem:1024,disk:1024)
      1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)

      If a role wishes to use both GPU and non-GPU resources for tasks, consuming 1 GPU will lead DRF to consider the role to have a 100% share of the cluster, since it consumes 100% of the GPUs in the cluster. This framework will then not receive any other offers.

      Among possible improvements, fairness can have understanding of resource packages. In a sense there is 1 GPU package that is competed on and 1000 non-GPU packages competed on, and ideally a role's consumption of the single GPU package does not have a large effect on the role's access to the other 1000 non-GPU packages.

      In the interim, we should consider having a recommended way to deal with scarce resources in the current model.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gyliu Guangya Liu
            bmahler Benjamin Mahler
            Benjamin Mahler Benjamin Mahler

            Dates

              Created:
              Updated:

              Slack

                Issue deployment