Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
It has been reported that for large clusters (2K datanodes) , a restarted namenode can often take hours to leave the safe-mode.
- admins have reported that if the data nodes are started, say 100 at a time, it significantly improves the startup time of the name node
- setting the initial heap (as opposed to max heap) to be larger also helps t- this avoids the GCs before more memory is added to the heap.
Observations of the Name node via JConsole and instrumentation:
- if 80% of memory is used for maintining the names and blocks data structures, then processing block reports can generate a lot of GC causing block reports to take a long time to process. This causes datanodes that sent the block reports to timeout and resend the block reports making the situation worse.
Hence to improve the situation the following are proposed:
1. Have random backoffs (of say 60sec for a 1K cluster) of the initial block report sent by a DN. This would match the randomization of the normal hourly block reports. (Jira HADOOP-2326)
2. Have the NN tell the DN how much to backoff (i.e. rather than a single configuration parameter for the backoff). This would allow the system to adjust automatically to cluster size - smaller clusters will startup faster than larger clusters. (Jira HADOOP-2444)
3. Change the block reports to be array of longs rather then array of block report objects - this would reduce the amount of memory used to process a block report. This would help the initial startup and also the block report process during normal operation outside of the safe-mode. (Jira HADOOP-2110)
4. Queue and acknowledge the receipts of the block reports and have separate set of threads process the block report queue. (HADOOP-2111)
5. Incremental BRs Jira HADOOP-1079
5 Jiras have been filed as noted.
Based on experiments, we may not want to proceed with option 4. While option 4 did help block report processing when tried on its own, it turned out that in combination with 1 it did not help much. Furthermore, clean up of RPC to remove the client-side timeout (see JIRA Hadoop-2188) would make this fix obsolete.
Attachments
Issue Links
- is related to
-
HDFS-1667 Consider redesign of block report processing
- Open