Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.7.0
-
None
-
Mesosphere RI-6 Sprint 2018-31, Storage R7 Sprint 33, Storage R8 Sprint 34
-
3
Description
Normally, frameworks are expected to checkpoint agent ID and resource provider ID before accepting an offer with an OfferOperation. From this expectation comes the requirement in the v1 scheduler API that a framework must provide the agent ID and resource provider ID when acknowledging an offer operation status update. However, this expectation breaks down:
1. the framework might lose its checkpointed data; it no longer remembers the agent ID or the resource provider ID
2. even if the framework checkpoints data, it could be sent a stale update: maybe the original ACK it sent to Mesos was lost, and it needs to re-ACK. If a framework deleted its checkpointed data after sending the ACK (that's dropped) then upon replay of the status update it no longer has the agent ID or resource provider ID for the operation.
An easy remedy would be to add the agent ID and resource provider ID to the OperationStatus message received by the scheduler so that a framework can build a proper ACK for the update, even if it doesn't have access to its previously checkpointed information.
I'm filing this as a BUG because there's no way to reliably use the offer operation status API until this has been fixed.
Attachments
Issue Links
- is related to
-
MESOS-9455 Add tests for operation status acknowledgement for different combinations of uuid, agent_id and resource_provider_id
- Open