Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1590

Decommissioning never ends when node to decommission has blocks that are under-replicated and cannot be replicated to the expected level of replication

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.20.2
    • None
    • namenode
    • None
    • Linux

    Description

      On a test cluster with 4 DNs and a default repl level of 3, I recently attempted to decommission one of the DNs. Right after the modification of the dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the blocks being replicated to other nodes.

      After a while, the replication stopped but the node was not marked as decommissioned.

      When running an 'fsck -files -blocks -locations' I saw that all files had a replication of 4 (which is logical given there are 4 DNs), but some of the files had an expected replication set to 10 (those were job.jar files from M/R jobs).

      I ran 'fs -setrep 3' on those files and shortly after the namenode reported the DN as decommissioned.

      Shouldn't this case be checked by the NameNode when decommissioning a node? I.e considere a node decommissioned if either one of the following is true for each block on the node being decommissioned:

      1. It is replicated more than the expected replication level.
      2. It is replicated as much as possible given the available nodes, even though it is less replicated than expected.

      Attachments

        Activity

          People

            Unassigned Unassigned
            herberts Mathias Herberts
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated: