Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
1.14.0
-
None
Description
This ticket is used to verify the result of FLINK-21110.
It should check if large scale jobs' scheduling are working well and the scheduling performance, with a real job running on cluster.
The conclusion should include, for a 10000 — all-to-all-connected -->10000 job:
1. time of job initialization on master (job received -> scheduling started)
2. time of task deployment (task deploying started -> all tasks in RUNNING)
3. time of making task failure recovery decision (JM notified about task failure -> tasks to restart decided)