Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6774

Role sorter and quota role sorter can have more copies of share resources in allocations than in total.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • allocation
    • None

    Description

      The way shared resources support works in the allocator is to allocate multiple copies of the shared resources so multiple frameworks can receive them. Multiple copies of the same shared resources doesn't affect the quantity of the sorter's allocations and total pool so it doesn't have an impact on DRF.

      To make resource accounting work, though, when the copies of the same resource are add to a framework's allocation, we increase total size of the total pool in the sorter (again, adding these copies doesn't affect quantity) so that the allocations in a sorter is always bounded by the total pool in the sorter. This invariant is a requirement for the following logic in the allocator to work:

      Remove the resources from the framework sorter when it's unallocated from the framework
            frameworkSorters[role]->unallocated(
                frameworkId.value(), slaveId, resources);
            frameworkSorters[role]->remove(slaveId, resources);
      

      e.g., if there are 2 copies of a shared disk allocated to framework1, the sorter's total pool has 2 copies of the disk as well.

      However we currently only do this for the framework sorter below a role because the allocator (implicitly) assumes that role sorter, being the root-level sorter, has a total pool that's unchanged during allocation or resource recover. This is not a problem right now because for this reason, Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const SlaveID& slaveId, const Resources& resources) are not called during allocation or resource recover.

      This will likely change with MESOS-6375, when role sorters are having a hierarchy so not all of them are bound to the physical size of the cluster. We should revisit the shared resource allocation logic then to make sure the invariant allocations in a sorter is always bounded by the total pool in the sorter holds.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xujyan Yan Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: