[SPARK-6183] Skip bad workers when re-launching executors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: Deploy
Labels:
- bulk-closed

Description

In standalone cluster, when an executor launch fails, the master should avoid re-launching it on the same worker.
According to the current scheduling logic, the failed executor will be highly possible re-launched on the same worker, and finally cause the application removed from the master.

Attachments

Issue Links

is duplicated by

SPARK-1499 Workers continuously produce failing executors

Resolved

SPARK-4609 Job can not finish if there is one bad slave in clusters

Resolved

SPARK-6353 Handling fatal errors of executors and decommission datanodes

Resolved

relates to

SPARK-4609 Job can not finish if there is one bad slave in clusters

Resolved

links to

[Github] Pull Request #4909 (zhpengg)

Activity

People

Assignee:: Unassigned

Reporter:: Zhen Peng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Mar/15 08:10

Updated:: 21/May/19 05:36

Resolved:: 21/May/19 05:36