Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11285

Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • None
    • None

    Description

      We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. In a large cluster, we met more than 100 nodes were dead, decommissioning and their Under replicated blocks Blocks with no live replicas were all ZERO. Actually It has been fixed in HDFS-7374. After that, we can refreshNode twice to eliminate this case. But, seems this patch missed after refactorHDFS-7411. We are using a Hadoop version based 2.7.1 and only below operations can transition the status from Dead, DECOMMISSION_INPROGRESS to Dead, DECOMMISSIONED:

      1. Retire it from hdfs-exclude
      2. refreshNodes
      3. Re-add it to hdfs-exclude
      4. refreshNodes

      So, why the code removed after refactor in the new DecommissionManager?

      if (!node.isAlive) {
        LOG.info("Dead node " + node + " is decommissioned immediately.");
        node.setDecommissioned();
      

      Attachments

        1. DecomStatus.png
          51 kB
          Lantao Jin

        Activity

          People

            Unassigned Unassigned
            cltlfcjin Lantao Jin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: