Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
https://github.com/apache/spark/pull/21758#discussion_r205652317
We shall improve cluster resource management to address the following issues:
- With dynamic resource allocation enabled, it may happen that we acquire some executors (but not enough to launch all the tasks in a barrier stage) and later release them due to executor idle time expire, and then acquire again.
- There can be deadlock with two concurrent applications. Each application may acquire some resources, but not enough to launch all the tasks in a barrier stage. And after hitting the idle timeout and releasing them, they may acquire resources again, but just continually trade resources between each other.