Details
-
Improvement
-
Status: Reviewable
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
For maintenance, sometimes operators will force the drain of a slave (via SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary (e.g. bad hardware).
To eliminate alerting noise, we'd like to add a 'Reason' that expresses the forced drain of the slave, so that these are not considered to be a generic slave removal TASK_LOST.
Attachments
Issue Links
- is related to
-
MESOS-3265 Starting maintenance needs to deactivate agents and kill tasks.
- Resolved
- relates to
-
MESOS-9298 Task failures sometimes can't be understood without looking into agent logs.
- Open
-
MESOS-1475 Provide a way to fully shut down a slave (kill all tasks underneath).
- Resolved