[FLINK-5471] Properly inform JobClientActor about terminated Mesos framework - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Won't Do
Affects Version/s: 1.2.0, 1.3.0
Fix Version/s: None
Component/s: Deployment / Mesos
Labels:
None

Description

In case that the Mesos framework running Flink terminates (e.g. exceeded number of container restarts) the JobClientActor is not properly informed. As a consequence, the client only terminates after the JobClientActor detects that it lost the connection to the JobManager (JobClientActorConnectionTimeoutException). The current default value for the timeout is 60s which is quite long to detect the connection loss in case of a termination.

I think it would be better to notify the JobClientActor which allows it to print a better message for the user and also allows it to react quicker.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Till Rohrmann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Jan/17 15:09

Updated:: 26/Feb/19 17:06

Resolved:: 26/Feb/19 17:06