Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1590

Decommissioning never ends when node to decommission has blocks that are under-replicated and cannot be replicated to the expected level of replication

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.20.2
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      Linux

      Description

      On a test cluster with 4 DNs and a default repl level of 3, I recently attempted to decommission one of the DNs. Right after the modification of the dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the blocks being replicated to other nodes.

      After a while, the replication stopped but the node was not marked as decommissioned.

      When running an 'fsck -files -blocks -locations' I saw that all files had a replication of 4 (which is logical given there are 4 DNs), but some of the files had an expected replication set to 10 (those were job.jar files from M/R jobs).

      I ran 'fs -setrep 3' on those files and shortly after the namenode reported the DN as decommissioned.

      Shouldn't this case be checked by the NameNode when decommissioning a node? I.e considere a node decommissioned if either one of the following is true for each block on the node being decommissioned:

      1. It is replicated more than the expected replication level.
      2. It is replicated as much as possible given the available nodes, even though it is less replicated than expected.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              herberts Mathias Herberts
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated: