Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
Mesos Foundations RI11 Sp 40
-
2
Description
If the agent fails over after having checkpointed a new operation but before the operation status update stream is created, the recovery process will fail.
This happens because agent will try to recover the operation status update streams even if it hasn't been created yet.
In order to prevent recovery failures, the agent should obtain the ids of the streams to recover by walking the directory in which operation status updates streams are stored.
The agent should also garbage collect streams if the checkpointed state doesn't contain a corresponding operation.