Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-5528

UIMA-DUCC: improve agent monitoring of cgroups

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Abandoned
    • None
    • future-DUCC
    • DUCC
    • None

    Description

      Currently agent performs node cgroup validation at startup only. In older versions of RedHat it has been observed that cgroup memory subsystem disappears due to the OS bug. Subsequently all jobs fail due to cgroup creation failure.

      Modify agent monitoring of a node by trying to test cgroup creation at regular intervals. This check should be part of the node metrics collection. If the cgroup creation fails, the agent should mark the state of cgroups as 'Broken'. This new state will be displayed by duccmon.

      Attachments

        Activity

          People

            cwiklik Jaroslaw Cwiklik
            cwiklik Jaroslaw Cwiklik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: