Details
-
Improvement
-
Status: Accepted
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently, we do not have heartbeats for executor <-> agent communication. This is especially problematic in scenarios when IPFilters are enabled since the default conntrack keep alive timeout is 5 days. When that timeout elapses, the executor doesn't get notified via a socket disconnection when the agent process restarts. The executor would then get killed if it doesn't re-register when the agent recovery process is completed.
Enabling application level heartbeats or TCP KeepAlive's can be a possible way for fixing this issue.
Attachments
Issue Links
- is a clone of
-
MESOS-7564 Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
-
- Resolved
-
- is related to
-
MESOS-7569 Allow "old" executors with half-open connections to be preserved during agent upgrade / restart.
-
- Resolved
-
-
MESOS-5361 Consider introducing TCP KeepAlive for Libprocess sockets.
-
- Accepted
-