Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27630

Stage retry causes totalRunningTasks calculation to be negative

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 3.0.0
    • Spark Core
    • None

    Description

      Track tasks separately for each stage attempt (instead of tracking by stage), and do NOT reset the numRunningTasks to 0 on StageCompleted.

      In the case of stage retry, the taskEnd event from the zombie stage sometimes makes the number of totalRunningTasks negative, which will causes the job to get stuck.
      Similar problem also exists with stageIdToTaskIndices & stageIdToSpeculativeTaskIndices.
      If it is a failed taskEnd event of the zombie stage, this will cause stageIdToTaskIndices or stageIdToSpeculativeTaskIndices to remove the task index of the active stage, and the number of totalPendingTasks will increase unexpectedly.

      Attachments

        Issue Links

          Activity

            People

              dzcxzl dzcxzl
              dzcxzl dzcxzl
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: