Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Using capacity scheduler, environment is 3 NM and each has 9 vcores, I ran a spark task with 4 executors and each executor 5 cores, as suspected, only 1 executor not able to start and will be reserved, but actually more containers are reserved. This way, I can not run some other smaller tasks. As I checked the capacity scheduler, the 'needContainers' method in LeafQueue.java has a computation of 'starvation', this cause the scenario of more container reserved than required, any idea or suggestion on this?