Details
-
Epic
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
task-failure-reasons
Description
Mesos communicates task state transitions via task status updates. They often include a reason, which aims to hint what exactly went wrong. However, these reasons are often:
- misleading
- vague
- generic.
Needless to say, this complicates triaging why the task has actually failed and hence is a bad user experience. The failures can come from a bunch of different sources: fetcher, isolators (including custom ones!), namespace setup, etc.
This epic aims to improve the UX by providing detailed, ideally typed, information about task failures.
Attachments
Issue Links
- is related to
-
MESOS-8531 Some task status updates sent by the default executor don't contain a REASON.
- Accepted
-
MESOS-2077 Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
- Reviewable