Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
Such as when there is a storage problem, DataNode capacity and block count sometimes become zero.
When we tried to decommission those DataNodes, we ran into an issue that the decommission did not complete because the NameNode had not received their first block report.
INFO blockmanagement.DatanodeAdminManager (DatanodeAdminManager.java:startDecommission(183)) - Starting decommission of 127.0.0.1:58343 [DISK]DS-a29de094-2b19-4834-8318-76cda3bd86bf:NORMAL:127.0.0.1:58343 with 0 blocks INFO blockmanagement.BlockManager (BlockManager.java:isNodeHealthyForDecommissionOrMaintenance(4587)) - Node 127.0.0.1:58343 hasn't sent its first block report. INFO blockmanagement.DatanodeAdminDefaultMonitor (DatanodeAdminDefaultMonitor.java:check(258)) - Node 127.0.0.1:58343 isn't healthy. It needs to replicate 0 more blocks. Decommission In Progress is still in progress.
To make matters worse, even if we stopped these DataNodes afterward, they remained in a dead&decommissioning state until NameNode restarted.
I think those DataNodes should be decommissioned immediately even if NameNode hasn't recived the first block report.
Attachments
Issue Links
- is superceded by
-
HDFS-15963 Unreleased volume references cause an infinite loop
- Resolved
- links to