Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Done
-
None
-
Mesos Foundation R7 Sprint 33, Mesos Foundations R8 Sprint 34, Mesos Foundations R8 Sprint 35
-
5
Description
Currently, we do not have heartbeats for executor <-> agent communication. This is especially problematic in scenarios when IPFilters are enabled since the default conntrack keep alive timeout is 5 days. When that timeout elapses, the executor doesn't get notified via a socket disconnection when the agent process restarts. The executor would then get killed if it doesn't re-register when the agent recovery process is completed.
Enabling application level heartbeats or TCP KeepAlive's can be a possible way for fixing this issue.
We should also update executor API documentation to explain the new behavior.
Attachments
Issue Links
- is cloned by
-
MESOS-7568 Introduce a heartbeat mechanism for v0 executor <-> agent links.
- Accepted
- is related to
-
MESOS-9727 Heartbeat calls from executor to agent are reported as errors
- Resolved
-
MESOS-5361 Consider introducing TCP KeepAlive for Libprocess sockets.
- Accepted
- relates to
-
MESOS-8366 Replace the command executor with the default executor.
- Reviewable
-
MESOS-540 Executor health checking.
- Open
-
MESOS-9258 Prevent subscribers to the master's event stream from leaking connections
- Resolved