[HDFS-273] Improve Block report processing and name node restarts (Master Jira) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

It has been reported that for large clusters (2K datanodes) , a restarted namenode can often take hours to leave the safe-mode.

admins have reported that if the data nodes are started, say 100 at a time, it significantly improves the startup time of the name node
setting the initial heap (as opposed to max heap) to be larger also helps t- this avoids the GCs before more memory is added to the heap.

Observations of the Name node via JConsole and instrumentation:

if 80% of memory is used for maintining the names and blocks data structures, then processing block reports can generate a lot of GC causing block reports to take a long time to process. This causes datanodes that sent the block reports to timeout and resend the block reports making the situation worse.

Hence to improve the situation the following are proposed:

1. Have random backoffs (of say 60sec for a 1K cluster) of the initial block report sent by a DN. This would match the randomization of the normal hourly block reports. (Jira ~~HADOOP-2326~~)

2. Have the NN tell the DN how much to backoff (i.e. rather than a single configuration parameter for the backoff). This would allow the system to adjust automatically to cluster size - smaller clusters will startup faster than larger clusters. (Jira HADOOP-2444)

3. Change the block reports to be array of longs rather then array of block report objects - this would reduce the amount of memory used to process a block report. This would help the initial startup and also the block report process during normal operation outside of the safe-mode. (Jira ~~HADOOP-2110~~)

4. Queue and acknowledge the receipts of the block reports and have separate set of threads process the block report queue. (~~HADOOP-2111~~)

5. Incremental BRs Jira ~~HADOOP-1079~~

5 Jiras have been filed as noted.

Based on experiments, we may not want to proceed with option 4. While option 4 did help block report processing when tried on its own, it turned out that in combination with 1 it did not help much. Furthermore, clean up of RPC to remove the client-side timeout (see JIRA Hadoop-2188) would make this fix obsolete.

Attachments

Issue Links

is related to

HDFS-1667 Consider redesign of block report processing

Open

Sub-Tasks

1.	Block Report Optimization: Replace class instance	Closed	Sanjay Radia
2.	Add a Random backoff for the initial block report sent to the Name node	Closed	Sanjay Radia
3.	Block Report Optimization: Queue block reports	Closed	Sanjay Radia
4.	NameNode shoud give the DataNodes a parameter that specifies the backoff time for initial block reports	Open	Unassigned
5.	DFS Scalability: Incremental block reports	Closed	Tomasz Nykiel

Activity

People

Assignee:: Sanjay Radia

Reporter:: Sanjay Radia

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 17/Dec/07 21:39

Updated:: 03/Mar/11 02:19