[HDFS-11285] Dead DataNodes keep a long time in (Dead, DECOMMISSION_INPROGRESS), and never transition to (Dead, DECOMMISSIONED) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.7.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. In a large cluster, we met more than 100 nodes were dead, decommissioning and their Under replicated blocks Blocks with no live replicas were all ZERO. Actually It has been fixed in HDFS-7374. After that, we can refreshNode twice to eliminate this case. But, seems this patch missed after refactorHDFS-7411. We are using a Hadoop version based 2.7.1 and only below operations can transition the status from Dead, DECOMMISSION_INPROGRESS to Dead, DECOMMISSIONED:

Retire it from hdfs-exclude
refreshNodes
Re-add it to hdfs-exclude
refreshNodes

So, why the code removed after refactor in the new DecommissionManager?

if (!node.isAlive) {
  LOG.info("Dead node " + node + " is decommissioned immediately.");
  node.setDecommissioned();

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

DecomStatus.png
06/Jan/17 08:56
51 kB
Lantao Jin

Activity

People

Assignee:: Unassigned

Reporter:: Lantao Jin

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 03/Jan/17 10:46

Updated:: 13/Mar/17 20:22