Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Won't Do
-
1.2.0, 1.3.0
-
None
-
None
Description
In case that the Mesos framework running Flink terminates (e.g. exceeded number of container restarts) the JobClientActor is not properly informed. As a consequence, the client only terminates after the JobClientActor detects that it lost the connection to the JobManager (JobClientActorConnectionTimeoutException). The current default value for the timeout is 60s which is quite long to detect the connection loss in case of a termination.
I think it would be better to notify the JobClientActor which allows it to print a better message for the user and also allows it to react quicker.