|
[
Permlink
| « Hide
]
Raghu Angadi added a comment - 06/Jul/07 05:47 PM
Last time I checked couple of months back, file name String somehow ended up using 128 byte array. Could you double check? Milind noticed that this might be because of using substring() to get file name from full path. If this is the case then, this can save around 100 bytes per file.
This patch removes the TreeMap for every HDFS directory, instead replaces it with a ArrayList. The FSDirectory code does a binary lookup on the ArrayList.
I measured that with 10M directories (with a fanout of 5 sub-directories per parent directory), the TreeMap occupies a total heap space of 1120 MB where the ArrayList implementation requires only 612MB. A whopping 50% improvement! Are there only directories? I think 50% is for a directory inode. When we consider all INodes, it would reduce around 50-60 bytes per INode on a 64 bit machine, a 12-15% of INode memory. 15% is also pretty large of course.
I measured only directories using an artificial program. You are absolutely right in saying that there are usually far more files than directories. The portion of memory occupied by directories will be reduced by almost half. But the overall total memory usage of the namenode will not reduce by 50%.
To add to this, irrespective of how many directories or files, memory reduced per INode is in the patch is 'sizeof(TreeMap.Entry) - sizeof(reference)'. this is true even if there are only directories in the namespace.
Also how did you measure memory for ArrayList alone? 600M for 10M (across many ArrayLists) seems pretty large. Merged patch with latest trunk.
Incorporated Konstantin's review comments.
This patch replaces each TreeMap in a directory with an ArrayList.
-1, build or testing failed
2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12363003/memoryReduction3.patch Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/510/testReport/ Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||