Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9287

Block placement completely fails if too many nodes are decommissioning

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:

      Description

      The DatanodeManager coordinates with the HeartbeatManager to update HeartbeatManager.Stats to track capacity and load. This is crucial for block placement to consider space and load. It's completely broken for decomm nodes.

      The heartbeat manager substracts the prior values before it adds new values. During registration of a decomm node, it substracts before seeding the initial values. This decrements nodesInService, flips state to decomm, add will not increment nodesInService (correct). There are other math bugs (double adding) that accidentally work due to 0 values.

      The result is every decomm node decrements the node count used for block placement. When enough nodes are decomm, the replication monitor will silently stop working. No logging. It searches all nodes and just gives up. Eventually, all block allocation will also completely fail. No files can be created. No jobs can be submitted.

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          HDFS-4861 was filed to fix the node count issue.

          Show
          kihwal Kihwal Lee added a comment - HDFS-4861 was filed to fix the node count issue.
          Hide
          kihwal Kihwal Lee added a comment -

          HDFS-4937 is also related

          Show
          kihwal Kihwal Lee added a comment - HDFS-4937 is also related
          Hide
          kshukla Kuhu Shukla added a comment -

          HDFS-7725 fixes this issue. Verified through a unit test.

          Show
          kshukla Kuhu Shukla added a comment - HDFS-7725 fixes this issue. Verified through a unit test.

            People

            • Assignee:
              kshukla Kuhu Shukla
              Reporter:
              daryn Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development