Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11120

maxNumExecutorFailures defaults to 3 under dynamic allocation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.5.1
    • 1.6.0
    • Spark Core
    • None

    Description

      With dynamic allocation, the spark.executor.instances config is 0, meaning this line ends up with maxNumExecutorFailures equal to 3, which for me has resulted in large dynamicAllocation jobs with hundreds of executors dying due to one bad node serially failing executors that are allocated on it.

      I think that using spark.dynamicAllocation.maxExecutors would make most sense in this case; I frequently run shells that vary between 1 and 1000 executors, so using s.dA.minExecutors or s.dA.initialExecutors would still leave me with a value that is lower than makes sense.

      Attachments

        Activity

          People

            rdub Ryan Williams
            rdub Ryan Williams
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: