Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When a new slave attempts to register, the registry is updated first, then the master's in-memory state is updated if the registry operation is applied successfully. However, when a slave is removed or marked unreachable, the master first updates its in-memory state, then updates the registry. This has two problems:
1. It makes it harder to reason about the correctness of concurrent operations that read in-memory state and update the registry.
2. It can leak incorrect information via the HTTP endpoints. That is, if we update the master's in-memory state on removal or marking a slave unreachable, that change will be observable via the HTTP endpoints. If the master then fails over (and the registry operation fails), the information returned via the endpoint will be incorrect. The master has special code to avoid this inaccuracy for reconciliation (see Master::transitioning()), but not for the endpoints.
I think it is simpler to just always update the registry first.
Attachments
Issue Links
- blocks
-
MESOS-5965 Implement garbage collection for unreachable agent lists in registry
- Resolved