Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29177

Zombie tasks prevents executor from releasing when task exceeds maxResultSize

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.4, 2.4.4
    • 2.4.5, 3.0.0
    • Spark Core
    • None

    Description

      When we fetch results from executors and found the total size has exceeded the maxResultSize configured, Spark will simply abort the stage and all dependent jobs. But the task triggered this is actually successful, but never post out `TaskEnd` event, as a result it will never be removed from `CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will be zombie executor(s) remaining in resource manager, it will never die until application ends.

      Attachments

        Issue Links

          Activity

            People

              adrian-wang Adrian Wang
              adrian-wang Adrian Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: