Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.5.1
-
None
Description
With dynamic allocation, the spark.executor.instances config is 0, meaning this line ends up with maxNumExecutorFailures equal to 3, which for me has resulted in large dynamicAllocation jobs with hundreds of executors dying due to one bad node serially failing executors that are allocated on it.
I think that using spark.dynamicAllocation.maxExecutors would make most sense in this case; I frequently run shells that vary between 1 and 1000 executors, so using s.dA.minExecutors or s.dA.initialExecutors would still leave me with a value that is lower than makes sense.