[SPARK-29177] Zombie tasks prevents executor from releasing when task exceeds maxResultSize - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.4, 2.4.4
Fix Version/s: 2.4.5, 3.0.0
Component/s: Spark Core
Labels:
None

Target Version/s:

3.0.0

Description

When we fetch results from executors and found the total size has exceeded the maxResultSize configured, Spark will simply abort the stage and all dependent jobs. But the task triggered this is actually successful, but never post out `TaskEnd` event, as a result it will never be removed from `CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will be zombie executor(s) remaining in resource manager, it will never die until application ends.

Attachments

Issue Links

links to

GitHub Pull Request #25850

Activity

People

Assignee:: Adrian Wang

Reporter:: Adrian Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Sep/19 08:41

Updated:: 23/Sep/19 14:52

Resolved:: 23/Sep/19 11:47