Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
Description
Per design in YARN-2928, consider various ATS writer failure scenarios, and implement proper handling.
For example, ATS writers may fail and exit due to OOM. It should be retried a certain number of times in that case. We also need to tie fatal ATS writer failures (after exhausting all retries) to the application failure, and so on.
Attachments
Issue Links
- depends upon
-
YARN-3033 [Collector wireup] Implement NM starting the standalone timeline collector daemon
- Resolved