Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5698

Quota sorter not updated for resource changes at agent.

    XMLWordPrintableJSON

    Details

    • Sprint:
      Mesosphere Sprint 38
    • Story Points:
      5

      Description

      Consider this sequence of events:

      1. Slave connects, with 128MB of disk.
      2. Master offers resources at slave to framework
      3. Framework creates a dynamic reservation for 1MB and a persistent volume of the same size on the slave's resources.
      => This invokes Master::apply, which invokes allocator->updateAllocation, which invokes Sorter::update() on the framework sorter and role sorter. If the framework's role has a configured quota, it also invokes update on the quota role sorter – in this case, the framework's role has no quota, so the quota role sorter is not updated.
      => DRFSorter::update updates the total resources at a given slave, among updating other state. New total resources will be 127MB of unreserved disk and 1MB of reserved disk with a volume. Note that the quota role sorter still thinks the slave has 128MB of unreserved disk.
      4. The slave is removed from the cluster. HierarchicalAllocatorProcess::removeSlave invokes:

        roleSorter->remove(slaveId, slaves[slaveId].total);
        quotaRoleSorter->remove(slaveId, slaves[slaveId].total.nonRevocable());
      

      slaves[slaveId].total.nonRevocable() is 127MB of unreserved disk and 1MB of reserved disk with a volume. When we remove this from the quota role sorter, we're left with total resources on the reserved slave of 1MB of unreserved disk, since that is the result of subtracting <127MB unreserved, 1MB reserved+volume> from <128MB unreserved>.

      The implications of this can't be good: at minimum, we're leaking resources for removed slaves in the quota role sorter. We're also introducing an inconsistency between total_.resources[slaveId] and total_.scalarQuantities, since the latter has already stripped-out volume/reservation information.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                neilc Neil Conway
                Reporter:
                neilc Neil Conway
                Shepherd:
                Alex R
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: