Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
If supervise is enabled, MesosClusterScheduler will retry a failing driver indefinitely. This takes up cluster resources which is freed up only when the driver is explicitly killed.
The proposed solution is to introduce spark configuration "spark.driver.supervise.maxRetries" which allows the maximum number of retries to be specified while preserving the default behavior of retrying the driver indefinitely.