Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8881

Standalone mode scheduling fails because cores assignment is not atomic

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.4.0, 1.5.0
    • 1.4.2, 1.5.0
    • Deploy
    • None

    Description

      Current scheduling algorithm (in Master.scala) has two issues:

      1. cores are allocated one at a time instead of spark.executor.cores at a time
      2. when spark.cores.max/spark.executor.cores < num_workers, executors are not launched and the app hangs (due to 1)

      === Edit by Andrew ===

      Here's an example from the PR. Let's say we have 4 workers with 16 cores each. We set `spark.cores.max` to 48 and `spark.executor.cores` to 16. Because in spread out mode, the existing code allocates 1 core at a time, we end up allocating 12 cores on each worker, and no executors can be launched because each one wants at least 16 cores. Instead, we should allocate 16 cores at a time.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nravi Nishkam Ravi
            nravi Nishkam Ravi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment