Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-677

Potential resource leak when complete and allocate pod happens simultaneously

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.11
    • None

    Description

      Let's say we have an app that has 1 pod needs for scheduling. The shim submits an app to the core, and start the schedule the pod. In the shim side, this is a task in the Scheduling state. Then we have a race if the following things happen simultaneously:

      1. User deletes the pod, this triggers a CompleteTask event in the shim side, and the shim will send a ReleaseAllocationAskRequest to the core.
      2. Before handling the ReleaseAllocationAskRequest from the shim, the core made an allocation for the given pod and send an Allocation to the shim

      then the core generates an allocation on a node, core receives the release request and deletes the pending ask; the shim side receives the new allocation, but since the pod has already been deleted so the shim ignores this allocation. In this case, the allocation will be left-over causing the resource leak.

      Attachments

        Issue Links

          Activity

            People

              wwei Weiwei Yang
              wwei Weiwei Yang
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: