I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened?
we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump.
I have hard time understanding the picture. How many bytes are we saving per ScanInfo?
In the particular case shown in memory-analysis.png, we save 86 characters in each string. The volume prefix that we avoid storing is /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/. Java uses 2 bytes per character (UCS-2 encoding), and we store both metaPath and blockPath, so multiply that by 4 to get 344. Then add the overhead of using two objects File that contain the path string instead of just the string itself-- probably around an extra 16 bytes per object, for 376 bytes in total saved per ScanInfo.
You might think that /home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/ is an unrealistically long volume path, but here is an example of a real volume path in use on a production cluster:
Putting the disk UUID into the volume is an obvious thing to do if you're a system administrator.