Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1715 Yunikorn performance improvements
  3. YUNIKORN-1724

Improve the performance of shim side scheduling cycle

    XMLWordPrintableJSON

Details

    Description

      Performance testing of Yunikorn uncovered that a lot of time is spent in Application.Schedule() in the shim. The problem is related to the fact that we collect task objects based on their state which is maintained by fsm.FSM. Even though we run Application.Schedule() once per second, it's still an issue due to the large number of RWMutex.RLock() calls. With a lot of pods, this consumes significant amount of CPU time.

      Also, different code paths are affected:
      The first is inside the switch-case part in Schedule(). We want to know the number of tasks in "New" state and we end up scanning all task objects for their status.
      The second is retrieving the "New" tasks from taskMap structure. This is done by GetNewTasks() / getTasks(), copying tasks based on their respective state to a new slice.

      To speed things up, we have to track the "New" tasks in a new map which is dynamically maintained when a new task added and when it leaves the New state (or the task gets removed). Knowing how many tasks we have also becomes trivial and won't require slice iteration/filtering.

      Attachments

        1. getNewTasks.png
          59 kB
          Peter Bacsko

        Issue Links

          Activity

            People

              pbacsko Peter Bacsko
              pbacsko Peter Bacsko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: