Follow up to
SPARK-5259. During stage retry, its possible for a stage to "complete" by registering all its map output and starting the downstream stages, before the latest task set has completed. This will result in the earlier task set continuing to submit tasks, that are both unnecessary and increase the chance of hitting SPARK-8029.
Spark should mark all tasks sets for a stage as zombie as soon as its map output is registered. Note that this involves coordination between the various scheduler components (DAGScheduler and TaskSetManager at least) which isn't easily testable with the current setup.
To be clear, this is not just referring to canceling running tasks (which may be taken care of by
SPARK-2666). This is to make sure that the taskset is marked as a zombie, to prevent submitting new tasks from this task set.