Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1605

Add unit tests for preempted placeholders

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      First test generic placeholder pre-emption:

      Create a job with placeholders:
      small enough to fits in the queue
      larger than the free space of the queue.
      This will leave the placeholder running for a long time as they need to timeout.
      Hopefully it generates a node that is fully loaded with gang placeholders.
      Then create a daemon set that must run on the node.
      The size of the daemon set must be large enough so that it does not fit on the node
      That should trigger the placeholder pre-emption.
      Before the fix: the placeholder data for the app does not show that the placeholder was removed
      After the fix: the placeholder data for the app shows a removed placeholder

      Second test is node removal with placeholders:

      Create a job with placeholders:
      small enough to fits in the queue
      larger than the free space of the queue.
      This will leave the placeholder running for a long time as they need to timeout.
      remove the node with at least 1 placeholder
      Before the fix: the placeholder data for the app does not show that the placeholder was removed
      After the fix: the placeholder data for the app shows a removed placeholder

      Third test is kill a placeholder:

      Create a job with placeholders:
      small enough to fits in the queue
      larger than the free space of the queue.
      This will leave the placeholder running for a long time as they need to timeout.
      mimic a removal of the placeholder via kubectl by creating a allocation release request and send that to the partition with the termination type STOPPED_BY_RM
      Before the fix: the placeholder data for the app does not show that the placeholder was removed
      After the fix: the placeholder data for the app shows a removed placeholder

      Working config:

      queue quota max size: 16GB / 16cpu
      nodes: 2 * 8GB / 8 cpu
      create an application with allocation: 4 GB / 4 cpu
      create an gang application requesting: 7 * 2GB / 2cpu
      create a daemon set pod for one of the nodes asking for 1GB / 1 cpu

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mani Manikandan R
            mani Manikandan R
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment