Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
We were testing a custom resource estimator which broadcasts oversubscribed resources, but they are not marked as "revocable".
This unfortunately triggered the following check in hierarchical allocator:
void HierarchicalAllocatorProcess::updateSlave(
// Check that all the oversubscribed resources are revocable.
CHECK_EQ(oversubscribed, oversubscribed.revocable());
This definitely shouldn't happen in production cluster. IMO, we should do both of following:
1. Make sure incorrect resource is not sent from agent (even crash agent process is better);
2. Decline agent registration if it's resources is incorrect, or even tell it to shutdown, and possibly remove this check.