[SPARK-24075] [Mesos] Supervised driver upon failure will be retried indefinitely unless explicitly killed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Mesos
Labels:
- bulk-closed

Description

If supervise is enabled, MesosClusterScheduler will retry a failing driver indefinitely. This takes up cluster resources which is freed up only when the driver is explicitly killed.

The proposed solution is to introduce spark configuration "spark.driver.supervise.maxRetries" which allows the maximum number of retries to be specified while preserving the default behavior of retrying the driver indefinitely.

Attachments

Issue Links

links to

[Github] Pull Request #21150 (nyogesh)

GitHub Pull Request #21150

Activity

People

Assignee:: Unassigned

Reporter:: Yogesh Natarajan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/Apr/18 02:16

Updated:: 25/May/21 01:51

Resolved:: 25/May/21 01:41