Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2373

DRFSorter needs to distinguish resources from different slaves.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.23.0
    • allocation
    • Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15
    • 2

    Description

      Currently the DRFSorter aggregates total and allocated resources across multiple slaves, which only works for scalar resources. We need to distinguish resources from different slaves.

      Suppose we have 2 slaves and 1 framework. The framework is allocated all resources from both slaves.

      Resources slaveResources =
        Resources::parse("cpus:2;mem:512;ports:[31000-32000]").get();
      
      DRFSorter sorter;
      
      sorter.add(slaveResources);  // Add slave1 resources
      sorter.add(slaveResources);  // Add slave2 resources
      
      // Total resources in sorter at this point is
      // cpus(*):4; mem(*):1024; ports(*):[31000-32000].
      // The scalar resources get aggregated correctly but ports do not.
      
      sorter.add("F");
      
      // The 2 calls to allocated only works because we simply do:
      //   allocation[name] += resources;
      // without checking that the 'resources' is available in the total.
      
      sorter.allocated("F", slaveResources);
      sorter.allocated("F", slaveResources);
      
      // At this point, sorter.allocation("F") is:
      // cpus(*):4; mem(*):1024; ports(*):[31000-32000].
      

      To provide some context, this issue came up while trying to reserve all unreserved resources from every offer.

      for (const Offer& offer : offers) { 
        Resources unreserved = offer.resources().unreserved();
        Resources reserved = unreserved.flatten(role, Resource::FRAMEWORK); 
      
        Offer::Operation reserve;
        reserve.set_type(Offer::Operation::RESERVE); 
        reserve.mutable_reserve()->mutable_resources()->CopyFrom(reserved); 
       
        driver->acceptOffers({offer.id()}, {reserve}); 
      } 
      

      Suppose the slave resources are the same as above:

      Slave1: cpus(*):2; mem(*):512; ports(*):[31000-32000]
      Slave2: cpus(*):2; mem(*):512; ports(*):[31000-32000]

      Initial (incorrect) total resources in the DRFSorter is:

      cpus(*):4; mem(*):1024; ports(*):[31000-32000]

      We receive 2 offers, 1 from each slave:

      Offer1: cpus(*):2; mem(*):512; ports(*):[31000-32000]
      Offer2: cpus(*):2; mem(*):512; ports(*):[31000-32000]

      At this point, the resources allocated for the framework is:

      cpus(*):4; mem(*):1024; ports(*):[31000-32000]

      After first RESERVE operation with Offer1:

      The allocated resources for the framework becomes:

      cpus(*):2; mem(*):512; cpus(role):2; mem(role):512; ports(role):[31000-32000]

      During second RESERVE operation with Offer2:

      HierarchicalAllocatorProcess::updateAllocation
        // ...
      
        FrameworkSorter* frameworkSorter =
          frameworkSorters[frameworks\[frameworkId\].role];
      
        Resources allocation = frameworkSorter->allocation(frameworkId.value());
      
        // Update the allocated resources.
        Try<Resources> updatedAllocation = allocation.apply(operations);
        CHECK_SOME(updatedAllocation);
      
        // ...
      

      allocation in the above code is:

      cpus(*):2; mem(*):512; cpus(role):2; mem(role):512; ports(role):[31000-32000]

      We try to apply a RESERVE operation and we fail to find ports(*):[31000-32000] which leads to the CHECK fail at CHECK_SOME(updatedAllocation);

      Attachments

        Issue Links

          Activity

            People

              mcypark Michael Park
              mcypark Michael Park
              Benjamin Mahler Benjamin Mahler
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: