Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11178

Improve naming around task failures in scheduler code



    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: Scheduler, Spark Core
    • Labels:
    • Target Version/s:


      Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running on the executor (e.g., due to pre-emption), Spark doesn't count the failure towards the maximum number of failures for the task. That commit introduced some vague naming that I think we should fix; in particular:

      (1) The variable "isNormalExit", which was used to refer to cases where the executor died for a reason unrelated to the tasks running on the machine. The problem with the existing name is that it's not clear (at least to me!) what it means for an exit to be "normal".

      (2) The variable "shouldEventuallyFailJob" is used to determine whether a task's failure should be counted towards the maximum number of failures allowed for a task before the associated Stage is aborted. The problem with the existing name is that it can be confused with implying that the task's failure should immediately cause the stage to fail because it is somehow fatal (this is the case for a fetch failure, for example: if a task fails because of a fetch failure, there's no point in retrying, and the whole stage should be failed).




            • Assignee:
              kayousterhout Kay Ousterhout
              kayousterhout Kay Ousterhout
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: