[HDFS-1590] Decommissioning never ends when node to decommission has blocks that are under-replicated and cannot be replicated to the expected level of replication - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.20.2
Fix Version/s: None
Component/s: namenode
Labels:
None
Environment:

Linux

Description

On a test cluster with 4 DNs and a default repl level of 3, I recently attempted to decommission one of the DNs. Right after the modification of the dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the blocks being replicated to other nodes.

After a while, the replication stopped but the node was not marked as decommissioned.

When running an 'fsck -files -blocks -locations' I saw that all files had a replication of 4 (which is logical given there are 4 DNs), but some of the files had an expected replication set to 10 (those were job.jar files from M/R jobs).

I ran 'fs -setrep 3' on those files and shortly after the namenode reported the DN as decommissioned.

Shouldn't this case be checked by the NameNode when decommissioning a node? I.e considere a node decommissioned if either one of the following is true for each block on the node being decommissioned:

1. It is replicated more than the expected replication level.
2. It is replicated as much as possible given the available nodes, even though it is less replicated than expected.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mathias Herberts

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 21/Jan/11 15:58

Updated:: 06/Dec/18 09:58