Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31418

Blacklisting feature aborts Spark job without retrying for max num retries in case of Dynamic allocation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0, 2.4.5
    • 3.1.0
    • Spark Core
    • None

    Description

      With Spark blacklisting, if a task fails on an executor, the executor gets blacklisted for the task. In order to retry the task, it checks if there are idle blacklisted executor which can be killed and replaced to retry the task if not it aborts the job without doing max retries.

      In the context of dynamic allocation this can be better, instead of killing the blacklisted idle executor (its possible there are no idle blacklisted executor), request an additional executor and retry the task.

      This can be easily reproduced with a simple job like below, although this example should fail eventually just to show that its not retried spark.task.maxFailures times:

      def test(a: Int) = { a.asInstanceOf[String] }
      sc.parallelize(1 to 10, 10).map(x => test(x)).collect 
      

      with dynamic allocation enabled and min executors set to 1. But there are various other cases where this can fail as well.

      Attachments

        Issue Links

          Activity

            People

              vsowrirajan Venkata krishnan Sowrirajan
              vsowrirajan Venkata krishnan Sowrirajan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: