We recently examined the NameNode heap dump of a big, heavy snapshot user, trying to trim some fat, and surely enough we found memory leak in it: when snapshots are removed, the corresponding data structures are not removed.
This cluster has 586 million file system objects (286 million files, 287 million blocks, 13 million directories), using around 132gb of heap.
While only 44.5 million files have snapshotted copies, (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies at some point in the past, but after snapshots are removed, those data structured are still kept in the heap.
INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in large clusters like this. In this cluster, a whopping 13.8gb of memory could have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~ 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
Heap histogram for reference:
I am thinking that inside
AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file diffs, it should also remove FileWithSnapshotFeature. I am not familiar with the snapshot implementation, so any guidance is greatly appreciated.