Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
An application may fail for a number of reasons. For example,
- In gang scheduling, placeholders have expired before all of them can be successfully allocated
- When no placement rules are defined (i.e. static queues are used), an application is submitted to an non-existent queue
- The total amount of resources requested by a gang-scheduled app exceeds the capacity of the queue
YK's the finite state machine has Failed as a terminal state of an app, meaning that YK won't try to bring back a failed app ever again. The consequence is that pods of such failed apps will be stuck in pending indefinitely. A better behavior is for YK to mark those pods as failed too, while also passing the reason of the failure to those pods.
Attachments
Issue Links
- causes
-
YUNIKORN-675 Pod status update could fail due to conflicts
- Closed
- links to