Details
Description
When the agent receives a DrainSlaveMessage and does not have any tasks or operations, it writes the DrainConfig to disk and is then implicitly stuck in a "draining" state indefinitely. For example, if an agent reregistration is triggered at such a time, the master may think the agent is operating normally and send a task to it, at which point the task will fail because the agent thinks it's draining (see this test for an example: https://reviews.apache.org/r/72364/).
If the agent receives a DrainSlaveMessage when it has no tasks or operations, it should avoid writing any DrainConfig to disk so that it immediately "transitions" into the already-drained state.
Attachments
Issue Links
- relates to
-
MESOS-10116 Attempt to reactivate disconnected agent crashes the master
- Resolved