Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
Job 235131 had a large string of JPs fail (when the JD OOM'd) with:
HttpWorkerThread.run() I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 192.168.3.77 failed to respond
For the short-term we should count this as a Croak (i.e. an unexpected termination that DUCC didn't request), even though it is not caused by user error, so that the users's process_failures_limit can eventually end the job.
Perhaps we need a "framework_failures_limit" in ducc.properties for errors caught in the ducc-side JP code as opposed to errors caught in user code.