Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.10
Description
Name: yunikorn-scheduler-6577f789d8-vc5cc Namespace: yunikorn Priority: 0 Node: ip-10-192-153-109.ca-central-1.compute.internal/10.192.153.109 Start Time: Tue, 26 Jan 2021 19:17:12 -0800 Labels: app=yunikorn component=yunikorn-scheduler pod-template-hash=6577f789d8 release=yunikorn Annotations: cni.projectcalico.org/podIP: 100.100.166.78/32 cni.projectcalico.org/podIPs: 100.100.166.78/32 kubernetes.io/psp: eks.privileged Status: Running IP: 100.100.166.78 IPs: IP: 100.100.166.78 Controlled By: ReplicaSet/yunikorn-scheduler-6577f789d8 Containers: yunikorn-scheduler-k8s: Container ID: docker://759f2b2f14ba37f46a42cdc59a5c51ed19d442ed717b81ee98d30177b7a184e6 Image: <>/cloudera/yunikorn-scheduler:0.10.0-b9 Image ID: docker-pullable://<>/cloudera/yunikorn-scheduler@sha256:878300a91cfd3b9d6dc515948afbfab23572a475b0df7006f06480ee06d1aceb Port: 9080/TCP Host Port: 0/TCP State: Running Started: Tue, 26 Jan 2021 19:18:01 -0800 Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 26 Jan 2021 19:17:33 -0800 Finished: Tue, 26 Jan 2021 19:17:33 -0800 Ready: True Restart Count: 3 Limits: cpu: 4 memory: 2Gi Requests: cpu: 200m memory: 1Gi Environment: NAMESPACE: yunikorn (v1:metadata.namespace) ADMISSION_CONTROLLER_IMAGE_REGISTRY: <>/cloudera/yunikorn-admission ADMISSION_CONTROLLER_IMAGE_TAG: 0.10.0-b9 ADMISSION_CONTROLLER_IMAGE_PULL_POLICY: Always ADMISSION_CONTROLLER_IMAGE_PULL_SECRETS: [dockercreds] Mounts: /etc/yunikorn/ from config-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro) yunikorn-scheduler-web: Container ID: docker://0b8205bb8292f193765bbc563ea10010106fd316257e523c3446c5685ee0d5bf Image: <>/cloudera/yunikorn-web:0.10.0-b9 Image ID: docker-pullable://<>/cloudera/yunikorn-web@sha256:a64b986df2dc737958701838f41f9fae7f2e4a353a497949ba6b9e75b4b44b66 Port: 9889/TCP Host Port: 0/TCP State: Running Started: Tue, 26 Jan 2021 19:17:17 -0800 Ready: True Restart Count: 0 Limits: cpu: 200m memory: 500Mi Requests: cpu: 100m memory: 100Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: yunikorn-configs Optional: false yunikorn-admin-token-dnq4h: Type: Secret (a volume populated by a Secret) SecretName: yunikorn-admin-token-dnq4h Optional: false QoS Class: Burstable Node-Selectors: role.node.kubernetes.io/liftie-infra=true Tolerations: CriticalAddonsOnly op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s role.node.kubernetes.io/liftie-infra=true:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 61s default-scheduler Successfully assigned yunikorn/yunikorn-scheduler-6577f789d8-vc5cc to ip-10-192-153-109.ca-central-1.compute.internal Normal Pulling 57s kubelet Pulling image "<>/cloudera/yunikorn-web:0.10.0-b9" Normal Started 56s kubelet Started container yunikorn-scheduler-web Normal Created 56s kubelet Created container yunikorn-scheduler-web Normal Pulled 56s kubelet Successfully pulled image "<>/cloudera/yunikorn-web:0.10.0-b9" Warning FailedPreStopHook 55s (x2 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh delete]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh delete' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n" Normal Killing 55s (x2 over 58s) kubelet FailedPostStartHook Warning BackOff 53s (x2 over 54s) kubelet Back-off restarting failed container Normal Pulling 41s (x3 over 60s) kubelet Pulling image "<>/cloudera/yunikorn-scheduler:0.10.0-b9" Warning FailedPostStartHook 40s (x3 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh create]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh create' exited with 137: , message: "" Normal Started 40s (x3 over 58s) kubelet Started container yunikorn-scheduler-k8s Normal Created 40s (x3 over 58s) kubelet Created container yunikorn-scheduler-k8s Normal Pulled 40s (x3 over 58s) kubelet Successfully pulled image "<>/cloudera/yunikorn-scheduler:0.10.0-b9"
This is not a blocker but the scheduler was restarted multiple(3) times, hence reporting. This could be due to issue in admission controller start script/
Attachments
Attachments
Issue Links
- causes
-
YUNIKORN-538 Scheduler is unable to recovery from a restart
- Closed
- links to