Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.23.0, 2.0.0-alpha, 3.0.0-alpha1
-
None
-
None
Description
When HeartbeatManager.heartbeatCheck runs:
- All DNs are scanned to count dead nodes
- Processes the first dead node
- If there was a dead node, loops to re-scan all DNs again
Processing the dead node holds the namesystem write lock while removing the node from the blockmap. It also appears to do a lot of work to immediately re-adjust the replication queues. All this processing might be too expensive while holding the write lock, ex. if a rack or two is lost.