Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5377

Improve DRF behavior with scarce resources.

    Details

    • Type: Epic
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: allocation
    • Labels:
      None
    • Epic Name:
      Scarce Resources

      Description

      The allocator currently uses the notion of Weighted Dominant Resource Fairness (WDRF) to establish a linear notion of fairness across allocation roles.

      DRF behaves well for resources that are present within each machine in a cluster (e.g. CPUs, memory, disk). However, some resources (e.g. GPUs) are only present on a subset of machines in the cluster.

      Consider the behavior when there are the following agents in a cluster:

      1000 agents with (cpus:4,mem:1024,disk:1024)
      1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)

      If a role wishes to use both GPU and non-GPU resources for tasks, consuming 1 GPU will lead DRF to consider the role to have a 100% share of the cluster, since it consumes 100% of the GPUs in the cluster. This framework will then not receive any other offers.

      Among possible improvements, fairness can have understanding of resource packages. In a sense there is 1 GPU package that is competed on and 1000 non-GPU packages competed on, and ideally a role's consumption of the single GPU package does not have a large effect on the role's access to the other 1000 non-GPU packages.

      In the interim, we should consider having a recommended way to deal with scarce resources in the current model.

        Issue Links

          Issues in Epic

            Activity

            Hide
            gyliu Guangya Liu added a comment -

            What about enhancing sorter ignore the scarce resources when computing share but only consider the major resources cpu, memory, disk etc? Cluster admin can define the scarce resources list via a master flag.

            Show
            gyliu Guangya Liu added a comment - What about enhancing sorter ignore the scarce resources when computing share but only consider the major resources cpu, memory, disk etc? Cluster admin can define the scarce resources list via a master flag.
            Hide
            klaus1982 Klaus Ma added a comment -

            I think we can ignore 100% used resources; so the dominant resource will be changed to CPU or MEM.

            Show
            klaus1982 Klaus Ma added a comment - I think we can ignore 100% used resources; so the dominant resource will be changed to CPU or MEM.
            Hide
            qianzhang Qian Zhang added a comment -

            Can we introduce weight for each resource allocated by Mesos master?
            Each resource's weight = the number of agents have this resource / the number of total agents

            And then when we calculate the resource share for each role/framework in DRF sorter, we can take this weight into account: resource share = resource weight * (allocation / total). So for the example in the description of this ticket, the weight of GPU will be 0.001, and the GPU share of the role which consumes the only 1 GPU will be 0.001 rather than 1. This can be the default behavior and we may consider to introduce a flag to Mesos master with which operator can explicitly set weight for each resource to override the default way to calculate the resource's weight.

            Show
            qianzhang Qian Zhang added a comment - Can we introduce weight for each resource allocated by Mesos master? Each resource's weight = the number of agents have this resource / the number of total agents And then when we calculate the resource share for each role/framework in DRF sorter, we can take this weight into account: resource share = resource weight * (allocation / total) . So for the example in the description of this ticket, the weight of GPU will be 0.001, and the GPU share of the role which consumes the only 1 GPU will be 0.001 rather than 1. This can be the default behavior and we may consider to introduce a flag to Mesos master with which operator can explicitly set weight for each resource to override the default way to calculate the resource's weight.
            Hide
            gyliu Guangya Liu added a comment -

            Another thinking is similar with MESOS-4923, what about introducing a new sorter to handle those scare resources? Cluster admin can define a list of scare resources when start up and allocator can sort those scare resources in a different sorter.

            Show
            gyliu Guangya Liu added a comment - Another thinking is similar with MESOS-4923 , what about introducing a new sorter to handle those scare resources? Cluster admin can define a list of scare resources when start up and allocator can sort those scare resources in a different sorter.
            Hide
            gyliu Guangya Liu added a comment -

            Some comments from Benjamin Mahler

            For now, we can implement the workaround that Guangya suggested of giving the ability for operators to specify an exclusion list for fairness. For example, to deal with a scarce number of GPUs the operator could specify --allocator_fairness_excluded_resource_names="gpus". Longer term we'll want a better solution that doesn't require a workaround but for now this workaround sounds good to me.

            Show
            gyliu Guangya Liu added a comment - Some comments from Benjamin Mahler For now, we can implement the workaround that Guangya suggested of giving the ability for operators to specify an exclusion list for fairness. For example, to deal with a scarce number of GPUs the operator could specify --allocator_fairness_excluded_resource_names="gpus". Longer term we'll want a better solution that doesn't require a workaround but for now this workaround sounds good to me.
            Hide
            gyliu Guangya Liu added a comment -

            Benjamin Mahler I posted a prototype here https://github.com/jay-lau/mesos/commit/d411e20350f9c10100314da113f705f00ea55d74

            The main idea for this is:
            1) Added a new flag named as allocator_fairness_excluded_resource_names to define the scare resources.
            2) Added helper functions to filter out scare resource.

            // Tests if the given Resource object is non scare. If the
            // fairnessExcludeResourceNames is specified, all of the resources in
            // fairnessExcludeResourceNames will be treated as scare resources,
            // and those resources will be filtered out.
            static bool isNonScare(
                const Resource& resource,
                const Option<hashset<std::string>>& fairnessExcludeResourceNames);
            // Returns the non scare resources, all of the resources in
            // fairnessExcludeResourceNames will be treated as scare resources,
            // and those resources will be filtered out.
            Resources nonScare(const Option<hashset<std::string>>&
              fairnessExcludeResourceNames = None()) const;
            

            3) Filter out the scare resources in allocator, the sorter is not aware of the scare resources.

            Show
            gyliu Guangya Liu added a comment - Benjamin Mahler I posted a prototype here https://github.com/jay-lau/mesos/commit/d411e20350f9c10100314da113f705f00ea55d74 The main idea for this is: 1) Added a new flag named as allocator_fairness_excluded_resource_names to define the scare resources. 2) Added helper functions to filter out scare resource. // Tests if the given Resource object is non scare. If the // fairnessExcludeResourceNames is specified, all of the resources in // fairnessExcludeResourceNames will be treated as scare resources, // and those resources will be filtered out. static bool isNonScare( const Resource& resource, const Option<hashset<std::string>>& fairnessExcludeResourceNames); // Returns the non scare resources, all of the resources in // fairnessExcludeResourceNames will be treated as scare resources, // and those resources will be filtered out. Resources nonScare( const Option<hashset<std::string>>& fairnessExcludeResourceNames = None()) const ; 3) Filter out the scare resources in allocator, the sorter is not aware of the scare resources.
            Hide
            bmahler Benjamin Mahler added a comment -

            Mitigations have been provided via a GPU framework capability (MESOS-5634) (which of course, GPU specific) and by allowing operators to exclude resources from fair sharing (see MESOS-5758).

            The GPU framework capability helps to reduce the likelihood that non-GPU workloads starve out GPU workloads that want to run on the GPU machines. There are caveats to this, for example:

            (1) If the framework is non-cooperative, it may fill GPU machines with non-GPU workloads, and there is currently no revocation mechanism to help evict these to make place for the GPU workloads.

            (2) A mixed-workload framework (one that runs both GPU and non-GPU workloads) cannot tell in general if an offer is from an agent with GPUs present, so it must use attributes to guarantee that it does not place non-GPU workloads on the GPU machine.

            The fairness exclusion list allows the operator to ensure that the GPU allocation does not quickly dominate the share of the role.

            Show
            bmahler Benjamin Mahler added a comment - Mitigations have been provided via a GPU framework capability ( MESOS-5634 ) (which of course, GPU specific) and by allowing operators to exclude resources from fair sharing (see MESOS-5758 ). The GPU framework capability helps to reduce the likelihood that non-GPU workloads starve out GPU workloads that want to run on the GPU machines. There are caveats to this, for example: (1) If the framework is non-cooperative, it may fill GPU machines with non-GPU workloads, and there is currently no revocation mechanism to help evict these to make place for the GPU workloads. (2) A mixed-workload framework (one that runs both GPU and non-GPU workloads) cannot tell in general if an offer is from an agent with GPUs present, so it must use attributes to guarantee that it does not place non-GPU workloads on the GPU machine. The fairness exclusion list allows the operator to ensure that the GPU allocation does not quickly dominate the share of the role.

              People

              • Assignee:
                gyliu Guangya Liu
                Reporter:
                bmahler Benjamin Mahler
                Shepherd:
                Benjamin Mahler
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Development