Currently agent performs node cgroup validation at startup only. In older versions of RedHat it has been observed that cgroup memory subsystem disappears due to the OS bug. Subsequently all jobs fail due to cgroup creation failure.
Modify agent monitoring of a node by trying to test cgroup creation at regular intervals. This check should be part of the node metrics collection. If the cgroup creation fails, the agent should mark the state of cgroups as 'Broken'. This new state will be displayed by duccmon.