Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11120

maxNumExecutorFailures defaults to 3 under dynamic allocation

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.6.0
    • Component/s: Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      With dynamic allocation, the spark.executor.instances config is 0, meaning this line ends up with maxNumExecutorFailures equal to 3, which for me has resulted in large dynamicAllocation jobs with hundreds of executors dying due to one bad node serially failing executors that are allocated on it.

      I think that using spark.dynamicAllocation.maxExecutors would make most sense in this case; I frequently run shells that vary between 1 and 1000 executors, so using s.dA.minExecutors or s.dA.initialExecutors would still leave me with a value that is lower than makes sense.

        Attachments

          Activity

            People

            • Assignee:
              rdub Ryan Williams
              Reporter:
              rdub Ryan Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: