Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.13.1, 1.12.4
Description
With FLINK-23202 it should now be possible to see when a remote RPC endpoint is no longer reachable. This can be used by the HeartbeatManager to mark an heartbeat target as no longer reachable. That way, it is possible for Flink to react faster to losses of components w/o having to wait for the heartbeat timeout to expire. This will result in faster recoveries (e.g. if a TaskExecutor dies).
With this change we can improve trading off speed of detecting dead TaskManagers against running on an unstable/overloaded network where heartbeat messages are delayed.
Attachments
Issue Links
- depends upon
-
FLINK-23202 RpcService should fail result futures if messages could not be sent
- Closed
- links to