Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
When there is an image pull error, this is what we see in the operator log:
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) at org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)
This is the information we have on kubernetes side:
Normal Scheduled 2m19s default-scheduler Successfully assigned default/quickstart-base-86787586cd-lb7j6 to minikube Warning Failed 20s kubelet Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded *Warning Failed 20s kubelet Error*: ErrImagePull Normal BackOff 19s kubelet Back-off pulling image "flink:1.14" *Warning Failed 19s kubelet Error*: ImagePullBackOff Normal Pulling 7s (x2 over 2m19s) kubelet Pulling image "flink:1.14"
It would be good to add the additional message (in this case Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded) to the message of the DeploymentFailedException for traceability.
Attachments
Issue Links
- relates to
-
FLINK-29744 Throw DeploymentFailedException on ImagePullBackOff
- Closed
- links to