Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We need to track the number of failed task attempts, otherwise, there is a chance they remain "invisible" in case of a successful DAG with performance degradation. From the counters, ideally, we should be able to the the overall time spent in the FAILED attempts.
UPDATE: we already have NUM_FAILED_TASKS, which might be misleading, as it's just attempts, anyway, still need to aggregate the time
Attachments
Issue Links
- links to