Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.18
-
None
-
https://staging.ultrascan.scigap.org slurm job ID 8560 in Jetstream
Description
Currently in clusters (PBS and SLURM) jobs are getting either re-queued due to node failures. In such scenarios the jobs are been executed after re-queueing but on gateway side it is taken as a FAILED job at the initial NODE_FAIL.
These types of failures need to be captured as retrying failures instead of taking it as an end result.