Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11178

Improve naming around task failures in scheduler code

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • None
    • 1.6.0
    • Scheduler, Spark Core
    • None

    Description

      Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running on the executor (e.g., due to pre-emption), Spark doesn't count the failure towards the maximum number of failures for the task. That commit introduced some vague naming that I think we should fix; in particular:

      (1) The variable "isNormalExit", which was used to refer to cases where the executor died for a reason unrelated to the tasks running on the machine. The problem with the existing name is that it's not clear (at least to me!) what it means for an exit to be "normal".

      (2) The variable "shouldEventuallyFailJob" is used to determine whether a task's failure should be counted towards the maximum number of failures allowed for a task before the associated Stage is aborted. The problem with the existing name is that it can be confused with implying that the task's failure should immediately cause the stage to fail because it is somehow fatal (this is the case for a fetch failure, for example: if a task fails because of a fetch failure, there's no point in retrying, and the whole stage should be failed).

      Attachments

        Activity

          People

            kayousterhout Kay Ousterhout
            kayousterhout Kay Ousterhout
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: