Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Won't Do
-
1.3.0
-
None
Description
When a framework disconnects unexpectedly from Mesos, the framework's Mesos tasks continue to run for a configurable period of time known as the failover timeout. If the framework reconnects to Mesos after the timeout has expired, Mesos rejects the connection attempt. It is expected that the framework discard the previous framework ID and then connect as a new framework.
When Flink is in this situation, the only recourse is to manually delete the ZK state where the framework ID kept. Let's improve the logic of the Mesos RM to automate that.