Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15945

DataNodes with zero capacity and zero blocks should be decommissioned immediately

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      Such as when there is a storage problem, DataNode capacity and block count sometimes become zero.
      When we tried to decommission those DataNodes, we ran into an issue that the decommission did not complete because the NameNode had not received their first block report.

      INFO  blockmanagement.DatanodeAdminManager (DatanodeAdminManager.java:startDecommission(183)) - Starting decommission of 127.0.0.1:58343 [DISK]DS-a29de094-2b19-4834-8318-76cda3bd86bf:NORMAL:127.0.0.1:58343 with 0 blocks
      INFO  blockmanagement.BlockManager (BlockManager.java:isNodeHealthyForDecommissionOrMaintenance(4587)) - Node 127.0.0.1:58343 hasn't sent its first block report.
      INFO  blockmanagement.DatanodeAdminDefaultMonitor (DatanodeAdminDefaultMonitor.java:check(258)) - Node 127.0.0.1:58343 isn't healthy. It needs to replicate 0 more blocks. Decommission In Progress is still in progress.
      

      To make matters worse, even if we stopped these DataNodes afterward, they remained in a dead&decommissioning state until NameNode restarted.

      I think those DataNodes should be decommissioned immediately even if NameNode hasn't recived the first block report.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              tasanuma Takanobu Asanuma
              Reporter:
              tasanuma Takanobu Asanuma

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h

                  Issue deployment