Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In some exceptions (e.g. mirror cannot be downloaded), submarine cannot listen to the actual task status and is always running now.
For example, in the case of a image that cannot be pulled, the actual job status is as follows.
status: conditions: - lastProbeTime: '2023-04-01T03:50:53Z' reason: PodInitializing type: Waiting - lastProbeTime: '2023-04-01T03:50:39Z' message: >- rpc error: code = Unknown desc = error pulling image configuration: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/5c/5ccab874feb97b32099f72978f97c8e7d129fbe7577464ad49b43f58f693ca90/data?verify=1680324025-7lKdJkTa1waOdofNoPtnsjwv%2FIQ%3D": EOF reason: ErrImagePull type: Waiting - lastProbeTime: '2023-04-01T03:49:58Z' message: >- Back-off pulling image "apache/submarine:jupyter-notebook-0.8.0-SNAPSHOT" reason: ImagePullBackOff type: Waiting - lastProbeTime: '2023-04-01T03:49:57Z' message: >- rpc error: code = Unknown desc = Error response from daemon: Head "https://registry-1.docker.io/v2/apache/submarine/manifests/jupyter-notebook-0.8.0-SNAPSHOT": Get "https://auth.docker.io/token?scope=repository%3Aapache%2Fsubmarine%3Apull&service=registry.docker.io": EOF reason: ErrImagePull type: Waiting - lastProbeTime: '2023-04-01T03:49:54Z' reason: PodInitializing type: Waiting containerState: waiting: reason: PodInitializing readyReplicas: 0
Therefore, we should refine the status a bit more.