Looking through the code handling for https://github.com/apache/spark/pull/21577, I was looking to see how we are killing task attempts. I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully.
stage 0.0 . (stage id 0, attempt 0)
- task 1.0 (task 1, attempt 0)
Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance
- task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail.
Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful. We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1.
Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt. It will kill the other attempt when one of them succeeds. See TaskSetManager.handleSuccessfulTask