Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-5528

UIMA-DUCC: improve agent monitoring of cgroups

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: future-DUCC
    • Component/s: DUCC
    • Labels:
      None

      Description

      Currently agent performs node cgroup validation at startup only. In older versions of RedHat it has been observed that cgroup memory subsystem disappears due to the OS bug. Subsequently all jobs fail due to cgroup creation failure.

      Modify agent monitoring of a node by trying to test cgroup creation at regular intervals. This check should be part of the node metrics collection. If the cgroup creation fails, the agent should mark the state of cgroups as 'Broken'. This new state will be displayed by duccmon.

        Attachments

          Activity

            People

            • Assignee:
              cwiklik Jaroslaw Cwiklik
              Reporter:
              cwiklik Jaroslaw Cwiklik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: