[SPARK-11178] Improve naming around task failures in scheduler code - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Trivial
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: Scheduler, Spark Core
Labels:
None

Target Version/s:

1.6.0

Description

Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running on the executor (e.g., due to pre-emption), Spark doesn't count the failure towards the maximum number of failures for the task. That commit introduced some vague naming that I think we should fix; in particular:

(1) The variable "isNormalExit", which was used to refer to cases where the executor died for a reason unrelated to the tasks running on the machine. The problem with the existing name is that it's not clear (at least to me!) what it means for an exit to be "normal".

(2) The variable "shouldEventuallyFailJob" is used to determine whether a task's failure should be counted towards the maximum number of failures allowed for a task before the associated Stage is aborted. The problem with the existing name is that it can be confused with implying that the task's failure should immediately cause the stage to fail because it is somehow fatal (this is the case for a fetch failure, for example: if a task fails because of a fetch failure, there's no point in retrying, and the whole stage should be failed).

Attachments

Issue Links

links to

[Github] Pull Request #9164 (kayousterhout)

Activity

People

Assignee:: Kay Ousterhout

Reporter:: Kay Ousterhout

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Oct/15 05:41

Updated:: 17/May/20 17:48

Resolved:: 27/Oct/15 23:56