Issue Details (XML | Word | Printable)

Key: HADOOP-3248
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: dhruba borthakur
Reporter: girish vaitheeswaran
Votes: 0
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Improve Namenode startup performance

Created: 14/Apr/08 06:02 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works fastRestarts.patch 2008-04-24 06:00 PM dhruba borthakur 9 kB
Text File Licensed for inclusion in ASF works fastRestarts.patch 2008-04-24 09:08 AM dhruba borthakur 9 kB
Text File Licensed for inclusion in ASF works fastRestarts2.patch 2008-04-30 08:22 AM dhruba borthakur 6 kB
Text File Licensed for inclusion in ASF works fastRestarts3.patch 2008-05-06 05:11 AM dhruba borthakur 7 kB
Text File Licensed for inclusion in ASF works fastRestarts3.patch 2008-05-04 08:13 AM dhruba borthakur 7 kB
Text File Licensed for inclusion in ASF works fastRestarts3.patch 2008-05-02 09:54 PM dhruba borthakur 7 kB
Text File FSImage.patch 2008-04-21 11:34 PM girish vaitheeswaran 2 kB
Issue Links:
Incorporates
 
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 06/May/08 06:14 PM


 Description  « Hide
One of the things that would need to be addressed as part of Namenode scalability is the HDFS recovery performance especially in scenarios where the number of files is large. There are instances where the number of files are in the vicinity of 20 million and in such cases the time taken for namenode startup is prohibitive. Here are some benchmark numbers on the time taken for namenode startup. These times do not include the time to process block reports.

Default scenario for 20 million files with the max java heap size set to 14GB : 40 minutes

Tuning various java options such as young size, parallel garbage collection, initial java heap size : 14 minutes

As can be seen, 14 minutes is still a long time for the namenode to recover and code changes are required to bring this time down further. To this end some prototype optimizations were done to reduce this time. Based on some timing analysis saveImage and loadFSImage where the primary methods that were consuming most of the time. Most of the time was being spent on doing object allocations. The goal of the optimizations is to reduce the number of memory allocations as much as possible.

Optimization 1: saveImage()
======================
Avoid allocation of the UTF8 object.

Old code
=======
new UTF8(fullName).write(out);

New Code
========
out.writeUTF(fullName)

Optimization 2: saveImage()
======================
Avoid object allocation of the PermissionStatus Object and the FsPermission object. This is to be done for Directories and for files.

Old code
=======
fileINode.getPermissionStatus().write(out)

New Code
=========
out.writeBytes(fileINode.getUserName())
out.writeBytes(fileINode.getGroupName())
out.writeShort(fileINode.getFsPermission().toShort())

Optimization 3
============
loadImage() could use the same mechanism where we would avoid allocating the PermissionStatus object and the FsPermission object.

Optimization 4
============
A hack was tried out to avoid the cost of object allocation from saveImage() where the fullName was being constructed using string concatenation. This optimization also helped improve performance

Overall these optimizations helped bring down the overall startup time down to slightly over 7 minutes. Most of all the remaining time is now spent in loadFSImage() since we allocate the INode and INodeDirectory objects. Any further optimizations will need to focus on loadFSImage()



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.