Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1715 Yunikorn performance improvements
  3. YUNIKORN-1724

Improve the performance of shim side scheduling cycle

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Performance testing of Yunikorn uncovered that a lot of time is spent in Application.Schedule() in the shim. The problem is related to the fact that we collect task objects based on their state which is maintained by fsm.FSM. Even though we run Application.Schedule() once per second, it's still an issue due to the large number of RWMutex.RLock() calls. With a lot of pods, this consumes significant amount of CPU time.

      Also, different code paths are affected:
      The first is inside the switch-case part in Schedule(). We want to know the number of tasks in "New" state and we end up scanning all task objects for their status.
      The second is retrieving the "New" tasks from taskMap structure. This is done by GetNewTasks() / getTasks(), copying tasks based on their respective state to a new slice.

      To speed things up, we have to track the "New" tasks in a new map which is dynamically maintained when a new task added and when it leaves the New state (or the task gets removed). Knowing how many tasks we have also becomes trivial and won't require slice iteration/filtering.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment