Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2
-
None
-
None
-
HDP 3.1.0.78
Description
Nodemanager does not clean local filecache dir event the size exceeds the config in yarn-site.xml. The config in yarn-site.xml is as follows:
<property>
<name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.nodemanager.localizer.cache.target-size-mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
I use docker to run my program and in docker container I will download file from hdfs to local dir. But after docker container killed or exit, the files doesn't cleaned by nodemanager, hence, the filecache dir increases and node enters unhealthy state. The docker start command with a mounted dir like this:
-v=/data1/hadoop/yarn/local/filecache/2115/models.tar.gz/models:/home/hadoop/xdl/models:rw -v=/data1/hadoop/yarn/local/filecache/2116:/data1/hadoop/yarn/local/filecache/2116 -v=/data1/hadoop/yarn/local/filecache/2117:/data1/hadoop/yarn/local/filecache/2117.
For example, the filecache dir size is
$ sudo du -sh .112G .
But nodemanager still does not clean it event I set cache size is 10GB.