Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-5048

DUCC Orchestrator (OR) record Process Manager (PM) Job CommandLine requests

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0-Ducc
    • Component/s: DUCC
    • Labels:
      None

      Description

      On uima-ducc-demo we saw one Job that caused PM to OOM. According to the PM log, the request for Job 784 from PM to Orchestrator to fetch the CommandLine (comprising the CLASSPATH) resulted in the unexpected value of null.

      1. Put more logging code into OR to better understand why a null value was returned to PM
      2. PM should prevent such Jobs Processes from launching, since there is no command line
      3. Increase PM's -Xmx on uima-ducc-demo from 150M to 200M (same as SM and WS)

        Activity

        Hide
        cwiklik Jerry Cwiklik added a comment -

        Modified PM to ignore tasks for which there is no command line (jobs, Services). This would be the case when the OR returns null for explicit request to provide a command line for a job or service.
        Modified the OR to log PM requests.

        Show
        cwiklik Jerry Cwiklik added a comment - Modified PM to ignore tasks for which there is no command line (jobs, Services). This would be the case when the OR returns null for explicit request to provide a command line for a job or service. Modified the OR to log PM requests.
        Hide
        cwiklik Jerry Cwiklik added a comment -

        Need to revisit this issue. When the PM is unable to connect to the OR, it should throw away the OR state.

        Check the code to see if the http wrapper in the PM is handling connectivity problems. It looks like it is eating an exception now and returning null for job details. Modify to make the code throw an exception which should be caught by the PM component and handled as described above

        In a scenario where the OR returns null for job details, the PM should continue what it is doing now which is send an update to agent where a missing cmdline is detected and process is marked as FAILED and reason=MissingCommandLine

        Show
        cwiklik Jerry Cwiklik added a comment - Need to revisit this issue. When the PM is unable to connect to the OR, it should throw away the OR state. Check the code to see if the http wrapper in the PM is handling connectivity problems. It looks like it is eating an exception now and returning null for job details. Modify to make the code throw an exception which should be caught by the PM component and handled as described above In a scenario where the OR returns null for job details, the PM should continue what it is doing now which is send an update to agent where a missing cmdline is detected and process is marked as FAILED and reason=MissingCommandLine
        Hide
        cwiklik Jerry Cwiklik added a comment -

        When a command line is missing the PM logs a message to identify if this is due to communication error or the OR sending an invalid cmd line. The publishes the process map to agents which marks a job with no cmdline as Failed with reason=MissingCommandLine

        Show
        cwiklik Jerry Cwiklik added a comment - When a command line is missing the PM logs a message to identify if this is due to communication error or the OR sending an invalid cmd line. The publishes the process map to agents which marks a job with no cmdline as Failed with reason=MissingCommandLine

          People

          • Assignee:
            cwiklik Jerry Cwiklik
            Reporter:
            lou.degenaro Lou DeGenaro
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development