Details
Description
BlockReportLeaseManager#checkLease will reject FBRs from DNs for conditions such as "unknown datanode", "not in pending set", "lease has expired", wrong lease id, etc. Lease rejection does not throw an exception. It returns false which bubbles up to NameNodeRpcServer#blockReport and interpreted as noStaleStorages.
A re-registering node whose FBR is rejected from an invalid lease becomes active with no blocks. A replication storm ensues possibly causing DNs to temporarily go dead (HDFS-12645), leading to more FBR lease rejections on re-registration. The cluster will have many "missing blocks" until the DNs next FBR is sent and/or forced.
Attachments
Attachments
Issue Links
- causes
-
HDFS-14723 Add helper method FSNamesystem#setBlockManagerForTesting() in branch-2
- Resolved
- is broken by
-
HDFS-7923 The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
- Resolved
- is duplicated by
-
HDFS-14208 A large number missingblocks happend after failover to active.
- Resolved
- is related to
-
HDFS-14314 fullBlockReportLeaseId should be reset after registering to NN
- Resolved
-
HDFS-14171 Performance improvement in Tailing EditLog
- Resolved
- relates to
-
HDFS-14725 Backport HDFS-12914 to branch-2 (Block report leases cause missing blocks until next report)
- Resolved