[SPARK-32518] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.1, 3.1.0
Component/s: Spark Core
Labels:
None

Description

Currently, CoarseGrainedSchedulerBackend.maxNumConcurrentTasks only considers the CPU for the max concurrent tasks. This can cause the application to hang when a barrier stage requires extra custom resources but the cluster doesn't have enough corresponding resources. Because, without the checking for other custom resources in maxNumConcurrentTasks, the barrier stage can be submitted to the TaskSchedulerImpl. But the TaskSchedulerImpl can not launch tasks for the barrier stage due to the insufficient task slots calculated by calculateAvailableSlots(which does check all kinds of resources).

Attachments

Issue Links

links to

[Github] Pull Request #29332 (Ngone51)

[Github] Pull Request #29395 (Ngone51)

Activity

People

Assignee:: wuyi

Reporter:: wuyi

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Aug/20 05:53

Updated:: 18/Aug/20 06:51

Resolved:: 06/Aug/20 05:40