Affects Version/s: None
Fix Version/s: 3.3.0
This Jira constructs DeadNodeDetector state machine model. The function it implements as follow:
- When a DFSInputstream is opened, a BlockReader is opened. If some DataNode of the block is found to inaccessible, put the DataNode into DeadNodeDetector#deadnode.(
HDFS-14649) will optimize this part. Because when DataNode is not accessible, it is likely that the replica has been removed from the DataNode.Therefore, it needs to be confirmed by re-probing and requires a higher priority processing.
- DeadNodeDetector will periodically detect the Node in DeadNodeDetector#deadnode, If the access is successful, the Node will be moved from DeadNodeDetector#deadnode. Continuous detection of the dead node is necessary. The DataNode need rejoin the cluster due to a service restart/machine repair. The DataNode may be permanently excluded if there is no added probe mechanism.
- DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using DataNode. When the DFSInputstream is closed, it will be moved from DeadNodeDetector#dfsInputStreamNodes.
- Every time get the global deanode, update the DeadNodeDetector#deadnode. The new DeadNodeDetector#deadnode Equals to the intersection of the old DeadNodeDetector#deadnode and the Datanodes are by DeadNodeDetector#dfsInputStreamNodes.
- DeadNodeDetector has a switch that is turned off by default. When it is closed, each DFSInputstream still uses its own local deadnode.
- This feature has been used in the XIAOMI production environment for a long time. Reduced hbase read stuck, due to node hangs.
- Just open the DeadNodeDetector switch and you can use it directly. No other restrictions. Don't want to use DeadNodeDetector, just close it.