Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
complex scenario with multiple things going on:
- one application with multiple pending requests
- two or more pending requests are reserved
- one of those reserved pending requests is being allocated (scheduler is done, cache confirm is called async)
- the request being allocated is cancelled by the shim in between the time the scheduler is done and the cache confirms
The cancellation of the shim triggers an update and the cache update triggers an update. These two updates cause counters for the number of reservations to be decremented twice.
The side effect is that the node that is reserved by the ask that is not removed will be skipped until that ask is allocated on a different node. If that takes a while (waiting for scale up for instance) then there will be an impact on scheduling.