Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
Let's say we have an app that has 1 pod needs for scheduling. The shim submits an app to the core, and start the schedule the pod. In the shim side, this is a task in the Scheduling state. Then we have a race if the following things happen simultaneously:
- User deletes the pod, this triggers a CompleteTask event in the shim side, and the shim will send a ReleaseAllocationAskRequest to the core.
- Before handling the ReleaseAllocationAskRequest from the shim, the core made an allocation for the given pod and send an Allocation to the shim
then the core generates an allocation on a node, core receives the release request and deletes the pending ask; the shim side receives the new allocation, but since the pod has already been deleted so the shim ignores this allocation. In this case, the allocation will be left-over causing the resource leak.
Attachments
Issue Links
- blocks
-
YUNIKORN-686 [Regression] Placeholders for completed applications are recreated during recovery
- Closed
- breaks
-
YUNIKORN-741 Regression: occupied resources miscalculated sometimes for yunikorn pods
- Closed
- links to