[YUNIKORN-1724] Improve the performance of shim side scheduling cycle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: shim - kubernetes
Labels:
- pull-request-available

Description

Performance testing of Yunikorn uncovered that a lot of time is spent in Application.Schedule() in the shim. The problem is related to the fact that we collect task objects based on their state which is maintained by fsm.FSM. Even though we run Application.Schedule() once per second, it's still an issue due to the large number of RWMutex.RLock() calls. With a lot of pods, this consumes significant amount of CPU time.

Also, different code paths are affected:
The first is inside the switch-case part in Schedule(). We want to know the number of tasks in "New" state and we end up scanning all task objects for their status.
The second is retrieving the "New" tasks from taskMap structure. This is done by GetNewTasks() / getTasks(), copying tasks based on their respective state to a new slice.

To speed things up, we have to track the "New" tasks in a new map which is dynamically maintained when a new task added and when it leaves the New state (or the task gets removed). Knowing how many tasks we have also becomes trivial and won't require slice iteration/filtering.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

getNewTasks.png
05/May/23 12:26
59 kB
Peter Bacsko

Issue Links

links to

GitHub Pull Request #587

Activity

People

Assignee:: Peter Bacsko

Reporter:: Peter Bacsko

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/May/23 12:03

Updated:: 20/Nov/23 16:52

Resolved:: 26/Jun/23 08:22