Current scheduling algorithm (in Master.scala) has two issues:
1. cores are allocated one at a time instead of spark.executor.cores at a time
2. when spark.cores.max/spark.executor.cores < num_workers, executors are not launched and the app hangs (due to 1)
=== Edit by Andrew ===
Here's an example from the PR. Let's say we have 4 workers with 16 cores each. We set `spark.cores.max` to 48 and `spark.executor.cores` to 16. Because in spread out mode, the existing code allocates 1 core at a time, we end up allocating 12 cores on each worker, and no executors can be launched because each one wants at least 16 cores. Instead, we should allocate 16 cores at a time.