Issue Details (XML | Word | Printable)

Key: HADOOP-1565
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: dhruba borthakur
Reporter: dhruba borthakur
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

DFSScalability: reduce memory usage of namenode

Created: 05/Jul/07 06:43 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.15.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works memoryReduction3.patch 2007-08-01 09:57 PM dhruba borthakur 5 kB

Resolution Date: 03/Aug/07 08:20 PM


 Description  « Hide
Experiments have demonstrated that a single file/block needs about 300 to 500 bytes of main memory on a 64-bit Namenode. This puts some limitations on the size of the file system that a single namenode can support. Most of this overhead occurs because a block and/or filename is inserted into multiple TreeMaps and/or HashSets.

Here are a few ideas that can be measured to see if an appreciable reduction of memory usage occurs:

1. Change FSDirectory.children from a TreeMap to an array. Do binary search in this array while looking up children. This saves a TreeMap object for every intermediate node in the directory tree.
2. Change INode from an inner class. This saves on one "parent object" reference for each INODE instance. 4 bytes per inode.
3. Keep all DatanodeDescriptors in an array. BlocksMap.nodes[] is currently a 64-bit reference to the DatanodeDescriptor object. Instead, it can be a 'short'. This will probably save about 16 bytes per block.
4. Change DatanodeDescriptor.blocks from a SortedTreeMap to a HashMap? Block report processing CPU cost can increase.

For the records: TreeMap has the following fields:
Object key;
Object value;
Entry left = null;
Entry right = null;
Entry parent;
boolean color = BLACK;

and HashMap object:
final Object key;
Object value;
final int hash;
Entry next;



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Raghu Angadi added a comment - 06/Jul/07 05:47 PM
Last time I checked couple of months back, file name String somehow ended up using 128 byte array. Could you double check? Milind noticed that this might be because of using substring() to get file name from full path. If this is the case then, this can save around 100 bytes per file.

dhruba borthakur added a comment - 23/Jul/07 05:45 PM
This patch removes the TreeMap for every HDFS directory, instead replaces it with a ArrayList. The FSDirectory code does a binary lookup on the ArrayList.

I measured that with 10M directories (with a fanout of 5 sub-directories per parent directory), the TreeMap occupies a total heap space of 1120 MB where the ArrayList implementation requires only 612MB. A whopping 50% improvement!


Raghu Angadi added a comment - 23/Jul/07 06:07 PM
Are there only directories? I think 50% is for a directory inode. When we consider all INodes, it would reduce around 50-60 bytes per INode on a 64 bit machine, a 12-15% of INode memory. 15% is also pretty large of course.

dhruba borthakur added a comment - 23/Jul/07 06:11 PM
I measured only directories using an artificial program. You are absolutely right in saying that there are usually far more files than directories. The portion of memory occupied by directories will be reduced by almost half. But the overall total memory usage of the namenode will not reduce by 50%.

Raghu Angadi added a comment - 23/Jul/07 06:31 PM
To add to this, irrespective of how many directories or files, memory reduced per INode is in the patch is 'sizeof(TreeMap.Entry) - sizeof(reference)'. this is true even if there are only directories in the namespace.

Also how did you measure memory for ArrayList alone? 600M for 10M (across many ArrayLists) seems pretty large.


dhruba borthakur added a comment - 31/Jul/07 07:58 AM
Merged patch with latest trunk.

Konstantin Shvachko added a comment - 01/Aug/07 12:34 AM - edited
  • I agree ArrayList should better serve the purpose than the TreeMap.
    It saves us about 50 bytes per directory entry according to my calculations.
  • hashcode. I don't think waisting 4 bytes per INode plus the complexity of supporting the hash code oriented
    ordering worth the performance gain we get from that. I would compare names as they are same as we did before.
    We are talking about 10-20 entries per directory and file names of length 10 on average.
  • is there a reason for reimplementing binary search rather than using Arrays.binarySearch()?
  • children = new ArrayList<INode>(5);
    5 should be a constant
  • System.out.println() should be removed.
  • Since you are cleaning up DatanodeDescriptor, could you please also remove redundant imports of
    NetworkTopology and net.Node;

dhruba borthakur added a comment - 01/Aug/07 09:57 PM
Incorporated Konstantin's review comments.

Konstantin Shvachko added a comment - 02/Aug/07 06:32 PM
+1

dhruba borthakur added a comment - 02/Aug/07 07:14 PM
This patch replaces each TreeMap in a directory with an ArrayList.

Hadoop QA added a comment - 02/Aug/07 09:34 PM
-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363003/memoryReduction3.patch against trunk revision r562041.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/510/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/510/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.


dhruba borthakur added a comment - 03/Aug/07 08:20 PM
I just committed this.