[HDFS-14559] Optimizing safemode leave mechanism - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: namenode
Labels:
None

Description

As HDFS-14186 mentioned, The last stage of namenode startup, it will leave safemode based on the condition that if blocks num reach to threshold. However the current condition is complete based on total blocks rather than total replications. So for a large cluster, after total blocks has reported from datanode, there are still large block replication pending report and load of namenode is continue high for long times. In some extreme case, between leave safemode time and process block report completely, namenode will not provide normal service and some datanodes could dead then register/blockreport again and again.
In one word, we need to upgrade safemode leave mechanism to support large cluster restart smooth.

Attachments

Activity

People

Assignee:: Xiaoqiao He

Reporter:: Xiaoqiao He

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Jun/19 09:34

Updated:: 02/Oct/19 17:14