Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8881

Standalone mode scheduling fails because cores assignment is not atomic

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0
    • Fix Version/s: 1.4.2, 1.5.0
    • Component/s: Deploy
    • Labels:
      None

      Description

      Current scheduling algorithm (in Master.scala) has two issues:

      1. cores are allocated one at a time instead of spark.executor.cores at a time
      2. when spark.cores.max/spark.executor.cores < num_workers, executors are not launched and the app hangs (due to 1)

      === Edit by Andrew ===

      Here's an example from the PR. Let's say we have 4 workers with 16 cores each. We set `spark.cores.max` to 48 and `spark.executor.cores` to 16. Because in spread out mode, the existing code allocates 1 core at a time, we end up allocating 12 cores on each worker, and no executors can be launched because each one wants at least 16 cores. Instead, we should allocate 16 cores at a time.

        Attachments

          Activity

            People

            • Assignee:
              nravi Nishkam Ravi
              Reporter:
              nravi Nishkam Ravi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: