Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24075

[Mesos] Supervised driver upon failure will be retried indefinitely unless explicitly killed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Mesos

    Description

      If supervise is enabled, MesosClusterScheduler will retry a failing driver indefinitely. This takes up cluster resources which is freed up only when the driver is explicitly killed.

      The proposed solution is to introduce spark configuration "spark.driver.supervise.maxRetries" which allows the maximum number of retries to be specified while preserving the default behavior of retrying the driver indefinitely.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ynataraj Yogesh Natarajan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: