Details
-
Documentation
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
It recently came to my attention that a subset of offer operations (e.g. RESERVE, UNRESERVE, et al.) are implemented speculatively within mesos master. Meaning that the master will apply the resource conversion internally *before* the conversion is checkpointed on the agent. The master may then re-offer the converted resource to a framework – even though the agent may still not have checkpointed the resource conversion. If the checkpointing process on the agent fails, then subsequent operations issued for the falsely-offered resource will fail. Because the master essentially "lied" to the framework about the true state of the supposedly-converted resource.
It's also been explained to me that this case is expected to be rare. However, it can impact the design/implementation of framework state machines and so it's critical that this information be documented clearly - outside of the C++ code base.