Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.0
Description
The master currently does not accumulate the resources used by offer operations on master failover. While we create a datastructure to hold this information, we missed updating it.
hashmap<FrameworkID, Resources> usedByOperations; if (provider.newOperations.isSome()) { foreachpair (const id::UUID& uuid, const Operation& operation, provider.newOperations.get()) { // Update to bookkeeping of operations. CHECK(!slave->operations.contains(uuid)) << "New operation " << uuid.toString() << " is already known"; Framework* framework = nullptr; if (operation.has_framework_id()) { framework = getFramework(operation.framework_id()); } addOperation(framework, slave, new Operation(operation)); } } allocator->addResourceProvider( slaveId, provider.newTotal.get(), usedByOperations);
Here usedByOperations is not updated.
This leads to problems when the operation becomes terminal and we try to recover the used resources which might not be known to the framework sorter inside the hierarchical allocator.
Attachments
Issue Links
- is caused by
-
MESOS-8207 Reconcile offer operations between resource providers, agents, and master
- Resolved
- is part of
-
MESOS-8582 Pass FrameworkInfo to agents when applying operations
- Accepted