Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24622

Task attempts in other stage attempts not killed when one task attempt succeeds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • Scheduler, Spark Core

    Description

      Looking through the code handling for https://github.com/apache/spark/pull/21577, I was looking to see how we are killing task attempts.  I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully.

       

      For instance:

      stage 0.0 . (stage id 0, attempt 0)

        - task 1.0 (task 1, attempt 0)

      Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance

        - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail.

       

      Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful.  We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1.

      Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt.  It will kill the other attempt when one of them succeeds.  See TaskSetManager.handleSuccessfulTask

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: