Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Duplicate
-
None
-
None
Description
Apply below job templates to reproduce the issue.
- First application with gang scheduling annotations
apiVersion: batch/v1 kind: Job metadata: name: batch-sleep-job-1 spec: completions: 2 parallelism: 2 template: metadata: labels: app: sleep applicationId: "batch-sleep-job-1" queue: root.sandbox annotations: yunikorn.apache.org/task-group-name: tg1 yunikorn.apache.org/task-groups: |- [{ "name": "tg1", "minMember": 2, "minResource": { "cpu": "100m", "memory": "500M" }, "nodeSelector": {}, "tolerations": [] }] spec: schedulerName: yunikorn restartPolicy: Never containers: - name: sleep300 image: "alpine:latest" command: ["sleep", "300"] resources: requests: cpu: "100m" memory: "500M"
2. First application to the same task group
apiVersion: batch/v1 kind: Job metadata: name: batch-sleep-job-2 spec: completions: 4 parallelism: 4 template: metadata: labels: app: sleep applicationId: "batch-sleep-job-2" queue: root.sandbox annotations: yunikorn.apache.org/task-group-name: tg1 yunikorn.apache.org/task-groups: |- [{ "name": "tg1", "minMember": 2, "minResource": { "cpu": "100m", "memory": "500M" }, "nodeSelector": {}, "tolerations": [] }] spec: schedulerName: yunikorn restartPolicy: Never containers: - name: sleep300 image: "alpine:latest" command: ["sleep", "300"] resources: requests: cpu: "100m" memory: "500M"
3. Third application to the same task group
apiVersion: batch/v1 kind: Job metadata: name: batch-sleep-job-3 spec: completions: 10 parallelism: 10 template: metadata: labels: app: sleep applicationId: "batch-sleep-job-3" queue: root.sandbox annotations: yunikorn.apache.org/task-group-name: tg1 yunikorn.apache.org/task-groups: |- [{ "name": "tg1", "minMember": 3, "minResource": { "cpu": "100m", "memory": "500M" }, "nodeSelector": {}, "tolerations": [] }] spec: schedulerName: yunikorn restartPolicy: Never containers: - name: sleep300 image: "alpine:latest" command: ["sleep", "300"] resources: requests: cpu: "100m" memory: "500M"
Now it can be seen that, the 3rd application is in pending state even though the place holder apps are created and terminated.
NAME↑ READY STATUS RS CPU MEM %CPU/R %MEM/R %CPU/L %MEM/L IP NODE QOS AGE │ │ batch-sleep-job-1-7lrd5 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.142.208 ip-10-192-143-108.ca-central-1.compute.internal BU 18m │ │ batch-sleep-job-1-lw4t9 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.134.213 ip-10-192-136-201.ca-central-1.compute.internal BU 18m │ │ batch-sleep-job-2-c95dg 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.142.210 ip-10-192-143-108.ca-central-1.compute.internal BU 17m │ │ batch-sleep-job-2-vnfjb 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.142.211 ip-10-192-143-108.ca-central-1.compute.internal BU 17m │ │ batch-sleep-job-2-x4mcz 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.134.216 ip-10-192-136-201.ca-central-1.compute.internal BU 17m │ │ batch-sleep-job-2-ztnfq 0/1 Completed 0 n/a n/a n/a n/a n/a n/a 100.100.134.217 ip-10-192-136-201.ca-central-1.compute.internal BU 17m │ │ batch-sleep-job-3-7tp5t 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-59mnj 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-bm4fd 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-c4mxg 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-cljfj 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-gcvnp 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-gwgnn 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-kj88t 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-p8c7w 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m │ │ batch-sleep-job-3-td575 0/0 Pending 0 n/a n/a n/a n/a n/a n/a n/a n/a BU 16m
Attaching stacktrace, yk.logand metrics API response for reference. This is observed with v0.10 build.
Attachments
Attachments
Issue Links
- duplicates
-
YUNIKORN-461 Remove allocations map from the partition
- Closed