Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15809

DeadNodeDetector doesn't remove live nodes from dead node set.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.1, 3.4.0
    • None
    • None

    Description

      We found the dead node detector might never remove the alive nodes from the dead node set in a big cluster. For example:

      1. 200 nodes are added to the dead node set by DeadNodeDetector.
      2. DeadNodeDetector#checkDeadNodes() adds 100 nodes to the deadNodesProbeQueue because the queue limited length is 100.
      3. The probe threads start working and probe 30 nodes.
      4. DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the same as the last time. So the 30 nodes that has already been probed are added to the queue again.
      5. Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If they are all dead then the live nodes behind them could never be recovered.

      Attachments

        1. HDFS-15809.001.patch
          17 kB
          Jinglun
        2. HDFS-15809.002.patch
          17 kB
          Jinglun
        3. HDFS-15809.003.patch
          18 kB
          Jinglun
        4. HDFS-15809.004.patch
          15 kB
          Jinglun
        5. HDFS-15809.005.patch
          15 kB
          Jinglun
        6. HDFS-15809.006.patch
          15 kB
          Jinglun
        7. HDFS-15809.007.patch
          15 kB
          Jinglun

        Issue Links

          Activity

            People

              LiJinglun Jinglun
              LiJinglun Jinglun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: