[SPARK-24622] Task attempts in other stage attempts not killed when one task attempt succeeds - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: Scheduler, Spark Core
Labels:
- bulk-closed

Description

Looking through the code handling for https://github.com/apache/spark/pull/21577, I was looking to see how we are killing task attempts. I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully.

For instance:

stage 0.0 . (stage id 0, attempt 0)

- task 1.0 (task 1, attempt 0)

Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance

- task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail.

Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful. We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1.

Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt. It will kill the other attempt when one of them succeeds. See TaskSetManager.handleSuccessfulTask

Attachments

Issue Links

is related to

SPARK-2666 Always try to cancel running tasks when a stage is marked as zombie

Resolved

relates to

SPARK-25250 Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple times

Resolved

SPARK-25773 Cancel zombie tasks in a result stage when the job finishes

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jun/18 14:03

Updated:: 25/May/21 01:51

Resolved:: 25/May/21 01:43