Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.9.2, 1.10.0, 1.11.0
Description
Currently, Flink waits to acknowledge a job submission until the corresponding JobManager has been created. Since its creation also involves the creation of the ExecutionGraph and potential FS operations, it can take a bit of time. If the user has configured a too low web.timeout, the submission can time out only reporting a TimeoutException to the user.
I propose to change the notion of job submission slightly. Instead of waiting until the JobManager has been created, a job submission is complete once all job relevant files have been uploaded to the Dispatcher and the Dispatcher has been told about it. Creating the JobManager will then belong to the actual job execution. Consequently, if problems occur while creating the JobManager it will result into a job failure.
Attachments
Attachments
Issue Links
- blocks
-
FLINK-16867 Simplify default timeout configuration
- Open
- causes
-
FLINK-19237 LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with "NoResourceAvailableException: Could not allocate the required slot within slot request timeout"
- Closed
-
FLINK-21659 CheckpointSettings not properly exposed for initializing jobs
- Closed
- fixes
-
FLINK-16429 failed to restore flink job from checkpoints due to unhandled exceptions
- Closed
- is related to
-
FLINK-16018 Improve error reporting when submitting batch job (instead of AskTimeoutException)
- Resolved
-
FLINK-19000 Forward JobStatus.INITIALIZING timestamp to ExecutionGraph
- Closed
- relates to
-
FLINK-19410 RestAPIStabilityTest does not assert on enum changes
- Closed
-
FLINK-19037 Introduce proper IO executor in Dispatcher
- Closed
-
FLINK-19219 Run JobManager initialization in a separate thread, to make it cancellable
- Closed
- links to