Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
None
-
None
Description
Performance testing of Yunikorn uncovered that a lot of time is spent in Application.Schedule() in the shim. The problem is related to the fact that we collect task objects based on their state which is maintained by fsm.FSM. Even though we run Application.Schedule() once per second, it's still an issue due to the large number of RWMutex.RLock() calls. With a lot of pods, this consumes significant amount of CPU time.
Also, different code paths are affected:
The first is inside the switch-case part in Schedule(). We want to know the number of tasks in "New" state and we end up scanning all task objects for their status.
The second is retrieving the "New" tasks from taskMap structure. This is done by GetNewTasks() / getTasks(), copying tasks based on their respective state to a new slice.
To speed things up, we have to track the "New" tasks in a new map which is dynamically maintained when a new task added and when it leaves the New state (or the task gets removed). Knowing how many tasks we have also becomes trivial and won't require slice iteration/filtering.
Attachments
Attachments
Issue Links
- links to