[SPARK-8881] Standalone mode scheduling fails because cores assignment is not atomic - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.4.0, 1.5.0
Fix Version/s: 1.4.2, 1.5.0
Component/s: Deploy
Labels:
None

Target Version/s:

1.4.2, 1.5.0

Description

Current scheduling algorithm (in Master.scala) has two issues:

1. cores are allocated one at a time instead of spark.executor.cores at a time
2. when spark.cores.max/spark.executor.cores < num_workers, executors are not launched and the app hangs (due to 1)

=== Edit by Andrew ===

Here's an example from the PR. Let's say we have 4 workers with 16 cores each. We set `spark.cores.max` to 48 and `spark.executor.cores` to 16. Because in spread out mode, the existing code allocates 1 core at a time, we end up allocating 12 cores on each worker, and no executors can be launched because each one wants at least 16 cores. Instead, we should allocate 16 cores at a time.