Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-4321

DUCC should not retry JPs forever when the JP framework fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 2.0.0-Ducc
    • DUCC
    • None

    Description

      Job 235131 had a large string of JPs fail (when the JD OOM'd) with:
      HttpWorkerThread.run() I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 192.168.3.77 failed to respond

      For the short-term we should count this as a Croak (i.e. an unexpected termination that DUCC didn't request), even though it is not caused by user error, so that the users's process_failures_limit can eventually end the job.
      Perhaps we need a "framework_failures_limit" in ducc.properties for errors caught in the ducc-side JP code as opposed to errors caught in user code.

      Attachments

        Activity

          People

            cwiklik Jaroslaw Cwiklik
            burn Burn L. Lewis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: