Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
None
-
None
Description
- Tried submitting an app( gang-app-timeout-no-gang.yaml ) with min member == parallelism. I see the app is rejected by scheduler. After this whatever app submitted is not getting scheduled...
- App is rejected with below error, after placeholder pods are timed out.
2021-03-16T03:12:41.214Z INFO scheduler/context.go:674 Invalid ask add requested by shim {"partition": "[mycluster]default", "applicationID": "gang-app-timeout-1009", "askKey": "cf58523b-9750-40b8-b148-b3319bdf3edf", "error": "failed to find application gang-app-timeout-1009, for allocation ask cf58523b-9750-40b8-b148-b3319bdf3edf"} 2021-03-16T03:12:41.214Z WARN cache/task.go:415 task allocation UUID is empty, sending this release request to yunikorn-core could cause all allocations of this app get released. skip this request, this may cause some resource leak. check the logs for more info! {"applicationID": "gang-app-timeout-1009", "taskID": "cf58523b-9750-40b8-b148-b3319bdf3edf", "taskAlias": "fifo/gang-app-timeout-1009-h5qlh", "allocationUUID": "", "task": "Failed"} 2021-03-16T03:12:41.214Z ERROR cache/task.go:243 task failed {"appID": "gang-app-timeout-1009", "taskID": "cf58523b-9750-40b8-b148-b3319bdf3edf", "reason": "task fifo/gang-app-timeout-1009-h5qlh failed because it is rejected by scheduler"} github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Task).handleFailEvent /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/task.go:243 github.com/looplab/fsm.(*FSM).afterEventCallbacks /grid/0/jenkins/go/pkg/mod/github.com/looplab/fsm@v0.1.0/fsm.go:414 github.com/looplab/fsm.(*FSM).Event.func1 /grid/0/jenkins/go/pkg/mod/github.com/looplab/fsm@v0.1.0/fsm.go:309 github.com/looplab/fsm.transitionerStruct.transition /grid/0/jenkins/go/pkg/mod/github.com/looplab/fsm@v0.1.0/fsm.go:354 github.com/looplab/fsm.(*FSM).doTransition /grid/0/jenkins/go/pkg/mod/github.com/looplab/fsm@v0.1.0/fsm.go:339 github.com/looplab/fsm.(*FSM).Event /grid/0/jenkins/go/pkg/mod/github.com/looplab/fsm@v0.1.0/fsm.go:321 github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Task).handle /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/task.go:152 github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Context).TaskEventHandler.func1 /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/context.go:770 github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.Start.func1 /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:194 2021-03-16T03:12:41.896Z INFO general/general.go:221 task completes {"appType": "general", "namespace": "fifo", "podName": "tg-timeout-1009-gang-app-timeout-1009-0", "podUID": "11c4a9dd-7ec4-4dee-8e36-eb0dc74bb6d1", "podStatus": "Failed"}
- After this error, any app submitted is not scheduled.
gang-app-timeout-1010-dph4q 0/1 Pending 0 11m gang-app-timeout-1010-f7zmp 0/1 Pending 0 11m gang-app-timeout-1010-xmzfk 0/1 Pending 0 11m tg-timeout-1010-gang-app-timeout-1010-0 0/1 Pending 0 11m tg-timeout-1010-gang-app-timeout-1010-1 0/1 Pending 0 11m tg-timeout-1010-gang-app-timeout-1010-2 0/1 Pending 0 11m
Complete logs are attached yk.log .
Stack trace attached stack .
Attachments
Attachments
Issue Links
- duplicates
-
YUNIKORN-567 Queue resources are not cleaned up after placeholder cleanup
- Closed