Experiments have demonstrated that a single file/block needs about 300 to 500 bytes of main memory on a 64-bit Namenode. This puts some limitations on the size of the file system that a single namenode can support. Most of this overhead occurs because a block and/or filename is inserted into multiple TreeMaps and/or HashSets.
Here are a few ideas that can be measured to see if an appreciable reduction of memory usage occurs:
1. Change FSDirectory.children from a TreeMap to an array. Do binary search in this array while looking up children. This saves a TreeMap object for every intermediate node in the directory tree.
2. Change INode from an inner class. This saves on one "parent object" reference for each INODE instance. 4 bytes per inode.
3. Keep all DatanodeDescriptors in an array. BlocksMap.nodes is currently a 64-bit reference to the DatanodeDescriptor object. Instead, it can be a 'short'. This will probably save about 16 bytes per block.
4. Change DatanodeDescriptor.blocks from a SortedTreeMap to a HashMap? Block report processing CPU cost can increase.
For the records: TreeMap has the following fields:
Entry left = null;
Entry right = null;
boolean color = BLACK;
and HashMap object:
final Object key;
final int hash;