[HDFS-16100] HA: Improve performance of Standby node transition to Active - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.1
Fix Version/s: None
Component/s: namenode
Labels:
None

Description

pendingDNMessages in Standby is used to support process postponed block reports. Block reports in pendingDNMessages would be processed:

If GS of replica is in the future, Standby Node will process it when corresponding edit log(e.g add_block) is loaded.
If replica is corrupted, Standby Node will process it while it transfer to Active.
If DataNode is removed, corresponding of block reports will be removed in pendingDNMessages.

Obviously, if num of corrupted replica grows, more time cost during transferring. In out situation, there're 60 millions block reports in pendingDNMessages before transfer. Processing block reports cost almost 7mins and it's killed by zkfc. The replica state of the most block reports is RBW with wrong GS(less than storedblock in Standby Node).

In my opinion, Standby Node could ignore the block reports that replica state is RBW with wrong GS. Because Active node/DataNode will remove it later.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-16100.001.patch
06/Jul/21 12:07
2 kB
wudeyu
HDFS-16100.patch
30/Jun/21 08:57
2 kB
wudeyu

Activity

People

Assignee:: wudeyu

Reporter:: wudeyu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/Jun/21 06:38

Updated:: 07/Jul/21 20:28