[FLINK-23826] Verify optimized scheduler performance for large-scale jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: 1.14.0
Fix Version/s: 1.14.0
Component/s: Runtime / Coordination
Labels:
None

Description

This ticket is used to verify the result of ~~FLINK-21110~~.
It should check if large scale jobs' scheduling are working well and the scheduling performance, with a real job running on cluster.

The conclusion should include, for a 10000 — all-to-all-connected -->10000 job:
1. time of job initialization on master (job received -> scheduling started)
2. time of task deployment (task deploying started -> all tasks in RUNNING)
3. time of making task failure recovery decision (JM notified about task failure -> tasks to restart decided)

Attachments

Activity

People

Assignee:: Zhu Zhu

Reporter:: Zhu Zhu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Aug/21 03:30

Updated:: 26/Sep/21 06:56

Resolved:: 26/Sep/21 06:44