Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7901

Attempt to request negative number of executors with dynamic allocation

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.3.1
    • Fix Version/s: None
    • Component/s: YARN
    • Labels:
      None

      Description

      I ran a spark-shell on YARN with dynamic allocation enabled; relevant params:

        --conf spark.dynamicAllocation.enabled=true \
        --conf spark.dynamicAllocation.minExecutors=5 \
        --conf spark.dynamicAllocation.maxExecutors=300 \
        --conf spark.dynamicAllocation.schedulerBacklogTimeout=3 \
        --conf spark.dynamicAllocation.executorIdleTimeout=300 \
      

      It started out with executors, went up to 300 when I ran a job, and then killed them all back down to 5 executors after 5mins of idle time; all working as intended.

      When I ran another job, it tried to request -187 executors:

      15/05/27 17:41:12 ERROR util.Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0
      java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -187 from the cluster manager. Please specify a positive number!
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
      	at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
      	at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
      	at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
      	at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
      	at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
      	at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
      	at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
      	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
      	at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      Now it seems like I'm stuck with 5 executors in this application as some internal state is corrupt.

      This dropbox folder has the stdout from my console, including the -187 error above, as well as the eventlog for this application.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rdub Ryan Williams
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: