Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Easy
Description
This issue was experienced in time to time, this time in production Ultrascan gateway, https://django.ultrascan.scigap.org/. This gateway is connected to the production stack an Django portal for admin operations.
When a job is submitted and queued a node failure happens, when this failure is notified through email notification job goes to UNKNOWN state in the gateway. In the remote cluster, the job gets re-queued and completed, and email notifications are sent. The Helix identifies UNKNOWN as a final job state and does not process emails sent after.
Currently, when this happens, an operational task takes care of updating the job status and processing the email notifications sent.