Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18553

Executor loss may cause TaskSetManager to be leaked

    XMLWordPrintableJSON

Details

    Description

      Due to a bug in TaskSchedulerImpl, the complete sudden loss of an executor may cause a TaskSetManager to be leaked, causing ShuffleDependencies and other data structures to be kept alive indefinitely, leading to various types of resource leaks (including shuffle file leaks).

      In a nutshell, the problem is that TaskSchedulerImpl did not maintain its own mapping from executorId to running task ids, leaving it unable to clean up taskId to taskSetManager maps when an executor is totally lost.

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: