Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.7.1
-
None
-
None
Description
I am now facing with orphan process of container. Here is the scenario:
With heavy task load, the NM machine CPU usage can reach almost 100%. When some container got event of kill, it will get SIGTERM , and then the parent process exit, leave the container process to OS. This container process need handle some shutdown events or some logic, but hardly can get CPU, we suppose to see a SIGKILL as there is DelayedProcessKiller ,but the parent process which persisted as container pid no longer exist, so the kill command can not reach the container process. This is how orphan container process come.
The orphan process do exit after some time, but the period can be very long, and will make the OS status worse. As I observed, the period can be several hours