Description
Due to the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks concurrently can result in starvation. For example, running 10 dispatchers could result in 5 of them getting all the offers, even if they have no jobs to launch. We must implement increase the refuse_seconds timeout to solve this problem. Another option would have been to implement suppress/revive, but that can cause starvation due to the unreliability of mesos RPC calls.
Attachments
Issue Links
- is related to
-
SPARK-19703 Add Suppress/Revive support to the Mesos Spark Driver
- Resolved
- relates to
-
SPARK-20483 Mesos Coarse mode may starve other Mesos frameworks if max cores is not a multiple of executor cores
- Resolved
- links to