[HDFS-16443] Fix edge case where DatanodeAdminDefaultMonitor doubly enqueues a DatanodeDescriptor on exception - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.4.0, 3.2.4, 3.3.5
Component/s: hdfs
Labels:
- pull-request-available

Description

As part of the fix merged in: https://issues.apache.org/jira/browse/HDFS-16303

There was a rare edge case noticed in DatanodeAdminDefaultMonitor which causes a DatanodeDescriptor to be added twice to the pendingNodes queue.

a [datanode is unhealthy so it gets added to "unhealthyDns"](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L227)
an exception is thrown which causes [this catch block](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L271) to execute
the [datanode is added to "pendingNodes"](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L276)
under certain conditions the [datanode can be added again from "unhealthyDns" to "pendingNodes" here](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java#L296)

This Jira is to track the 1 line fix for this bug

Attachments

Issue Links

is caused by

HDFS-16303 Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

Resolved

links to

GitHub Pull Request #3942

Activity

People

Assignee:: Kevin Wikant

Reporter:: Kevin Wikant

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Jan/22 15:10

Updated:: 31/Jan/22 07:17

Resolved:: 31/Jan/22 07:17

Time Tracking

Estimated:

Not Specified

Remaining:

0h

Logged:

40m