Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
In standalone cluster, when an executor launch fails, the master should avoid re-launching it on the same worker.
According to the current scheduling logic, the failed executor will be highly possible re-launched on the same worker, and finally cause the application removed from the master.
Attachments
Issue Links
- is duplicated by
-
SPARK-1499 Workers continuously produce failing executors
- Resolved
-
SPARK-4609 Job can not finish if there is one bad slave in clusters
- Resolved
-
SPARK-6353 Handling fatal errors of executors and decommission datanodes
- Resolved
- relates to
-
SPARK-4609 Job can not finish if there is one bad slave in clusters
- Resolved
- links to