[SPARK-32198] Don't fail running jobs when decommissioned executors finally go away - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: Spark Core
Labels:
None

Target Version/s:

3.1.0

Description

When a decommissioned executor is finally lost, its death shouldn't fail running jobs.

A decommissioned executor will eventually die, and in response to its heartbeat failure we will generate a `SlaveLost` message. This SlaveLost message should be treated specially for decommissioned executors: It should not be deemed that this loss is due to the running application. Decommissioning is an exogenous event and the running application shouldn't be penalized for it.

Attachments

Issue Links

links to

[Github] Pull Request #29014 (agrawaldevesh)

Activity

People

Assignee:: Devesh Agrawal

Reporter:: Devesh Agrawal

Shepherd:: Holden Karau

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Jul/20 20:25

Updated:: 30/Jul/20 18:58

Resolved:: 30/Jul/20 18:58