Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-3659

DUCC Job Driver (JD) OOMs when Total number of work items is large

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0-Ducc
    • 1.1.0-Ducc
    • DUCC
    • None

    Description

      A Job of 300,000+ Total work items failed with Reason Premature after processing 70,000+ of them.

      The Job Driver (JD) maintains a file in the user's log directory named work-item-status.json.gz comprising the information shown by the WebServer on the Work Items tab of the Job Details page. As each work item is processed, the JD's WorkItemStateManager (WiSm) maintains an in-memory representation for Id, Node, PID, State, Start and End times. Periodically, the JD employs the WiSm's export method to re-write the above file.

      Although the amount of information is relatively small per work item, when the number of work items is large the amount of memory consumed is large since these in-memory representations are kept for the lifetime of the Job.

      To alleviate this "designed-in" memory leak, the WiSm should only keep active work items in-memory. Terminal work items should be flushed to disk. This change will affect DUCC components that employ WiSm, specifically CLI, WS and JD.

      Attachments

        Activity

          People

            lou.degenaro Lou DeGenaro
            lou.degenaro Lou DeGenaro
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: