Description
If a job is submitted to run locally using masterURL = "local[X]", spark will not retry a failed task regardless of your "spark.task.maxFailures" setting. This design is to facilitate debugging and QA of spark application where all tasks are expected to succeed and yield a results. Unfortunately, such setting will prevent a local job from finished if any of its task cannot guarantee a result (e.g. visiting an external resouce/API), and retrying inside the task is less favoured (e.g. the task needs to be executed on a different computer on production).
User however can still set masterURL ="local[X,Y]" to override this (where Y is the local maxFailures), but it is not documented and hard to manage. A quick fix to this can be to add a new configuration property "spark.local.maxFailures" with a default value of 1. So user knows exactly where to change when reading the documentation
Attachments
Issue Links
- links to