Details
Description
Due to a bug in TaskSchedulerImpl, the complete sudden loss of an executor may cause a TaskSetManager to be leaked, causing ShuffleDependencies and other data structures to be kept alive indefinitely, leading to various types of resource leaks (including shuffle file leaks).
In a nutshell, the problem is that TaskSchedulerImpl did not maintain its own mapping from executorId to running task ids, leaving it unable to clean up taskId to taskSetManager maps when an executor is totally lost.