Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Mesosphere Sprint 77
-
3
Description
Currently, when the agent encounters an error checkpointing its resources to disk, the agent process will exit. Now that the master sends ApplyOperationMessage to the agent in order to apply operations, we can implement operation feedback on the agent and the agent no longer needs to unconditionally terminate when checkpointing fails.
For backward compatibility with older masters, the agent should still terminate if it receives a CheckpointResourcesMessage from the master and an error is encountered while checkpointing.
However, when checkpointing is attempted in the handler for ApplyOperationMessage, the agent can handle errors by sending a terminal operation update to the master.