Details
-
Bug
-
Status: Accepted
-
Critical
-
Resolution: Unresolved
-
None
-
None
Description
Currently, the implementation of partition awareness re-uses the --agent_removal_rate_limit when marking agents as unreachable. This means that partition aware frameworks are exposed to the agent removal rate limit, when they rather would like to see the information immediately and impose their own rate limiting.
Rather than waiting for non-partition-aware support to be removed (that may not occur for a long time) per MESOS-5948, we should instead fix the implementation so that unreachability does not get gated behind the agent removal rate limiting.
Marking this as a bug since from the user's perspective it doesn't behave as expected, there should be a separate flag for rate limiting unreachability marking, but likely unreachability marking does not need rate limiting, since the intention was for frameworks to impose their own rate limiting for replacing tasks.
Attachments
Issue Links
- blocks
-
MESOS-5948 Remove rate-limiting for agent removal
- Open
- is related to
-
MESOS-8386 Inaccurate rate limiting of marking agents unreachable after master failover.
- Open