Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24075

[Mesos] Supervised driver upon failure will be retried indefinitely unless explicitly killed

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Mesos
    • Labels:
      None

      Description

      If supervise is enabled, MesosClusterScheduler will retry a failing driver indefinitely. This takes up cluster resources which is freed up only when the driver is explicitly killed.

      The proposed solution is to introduce spark configuration "spark.driver.supervise.maxRetries" which allows the maximum number of retries to be specified while preserving the default behavior of retrying the driver indefinitely.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ynataraj Yogesh Natarajan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: