Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
As part of the fix merged in: https://issues.apache.org/jira/browse/HDFS-16303
There was a rare edge case noticed in DatanodeAdminDefaultMonitor which causes a DatanodeDescriptor to be added twice to the pendingNodes queue.
- a [datanode is unhealthy so it gets added to "unhealthyDns"](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L227)
- an exception is thrown which causes [this catch block](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L271) to execute
- the [datanode is added to "pendingNodes"](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L276)
- under certain conditions the [datanode can be added again from "unhealthyDns" to "pendingNodes" here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L296)
This Jira is to track the 1 line fix for this bug
Attachments
Issue Links
- is caused by
-
HDFS-16303 Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
- Resolved
- links to