Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Some users have reported issues where a task fails due to an environment / configuration issue on some machine, then the task is reattempted on that same buggy machine until the entire job failures because that single task has failed too many times.
To guard against this, maybe we should add some randomization in how we reschedule failed tasks.
Attachments
Issue Links
- is related to
-
SPARK-2425 Standalone Master is too aggressive in removing Applications
- Closed