Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13571 Deadnode detection
  3. HDFS-14648

Implement DeadNodeDetector basic model



    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:


      This Jira constructs DeadNodeDetector state machine model. The function it implements as follow:

      1. When a DFSInputstream is opened, a BlockReader is opened. If some DataNode of the block is found to inaccessible, put the DataNode into DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when DataNode is not accessible, it is likely that the replica has been removed from the DataNode.Therefore, it needs to be confirmed by re-probing and requires a higher priority processing.
      2. DeadNodeDetector will periodically detect the Node in DeadNodeDetector#deadnode, If the access is successful, the Node will be moved from DeadNodeDetector#deadnode. Continuous detection of the dead node is necessary. The DataNode need rejoin the cluster due to a service restart/machine repair. The DataNode may be permanently excluded if there is no added probe mechanism.
      3. DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using DataNode. When the DFSInputstream is closed, it will be moved from DeadNodeDetector#dfsInputStreamNodes.
      4. Every time get the global deanode, update the DeadNodeDetector#deadnode. The new DeadNodeDetector#deadnode Equals to the intersection of the old DeadNodeDetector#deadnode and the Datanodes are by DeadNodeDetector#dfsInputStreamNodes.
      5. DeadNodeDetector has a switch that is turned off by default. When it is closed, each DFSInputstream still uses its own local deadnode.
      6. This feature has been used in the XIAOMI production environment for a long time. Reduced hbase read stuck, due to node hangs.
      7. Just open the DeadNodeDetector switch and you can use it directly. No other restrictions. Don't want to use DeadNodeDetector, just close it.
        if (sharedDeadNodesEnabled && deadNodeDetector == null) {
          deadNodeDetector = new DeadNodeDetector(name);
          deadNodeDetectorThr = new Daemon(deadNodeDetector);


        1. HDFS-14648.001.patch
          28 kB
          Lisheng Sun
        2. HDFS-14648.002.patch
          28 kB
          Lisheng Sun
        3. HDFS-14648.003.patch
          30 kB
          Lisheng Sun
        4. HDFS-14648.004.patch
          43 kB
          Lisheng Sun
        5. HDFS-14648.005.patch
          36 kB
          Lisheng Sun
        6. HDFS-14648.006.patch
          35 kB
          Lisheng Sun
        7. HDFS-14648.007.patch
          35 kB
          Lisheng Sun
        8. HDFS-14648.008.patch
          37 kB
          Lisheng Sun
        9. HDFS-14648.009.patch
          37 kB
          Lisheng Sun
        10. HDFS-14648.010.patch
          37 kB
          Lisheng Sun
        11. HDFS-14648.011.patch
          37 kB
          Lisheng Sun

          Issue Links



              • Assignee:
                leosun08 Lisheng Sun
                leosun08 Lisheng Sun
              • Votes:
                0 Vote for this issue
                11 Start watching this issue


                • Created: