Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8056

Decommissioned dead nodes should continue to be counted as dead after NN restart

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • None
    • Reviewed

    Description

      We had some offline discussion with Andrew Wang and Colin McCabe about this. Bring this up for more input and get the patch in place.

      Dead nodes are tracked by DatanodeManager's datanodeMap. However, after NN restarts, those nodes that were dead before NN restart won't be in datanodeMap. DatanodeManager's getDatanodeListForReport will add those dead nodes, but not if they are in the exclude file.

          if (listDeadNodes) {
            for (InetSocketAddress addr : includedNodes) {
              if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
                continue;
              }
              // The remaining nodes are ones that are referenced by the hosts
              // files but that we do not know about, ie that we have never
              // head from. Eg. an entry that is no longer part of the cluster
              // or a bogus entry was given in the hosts files
              //
              // If the host file entry specified the xferPort, we use that.
              // Otherwise, we guess that it is the default xfer port.
              // We can't ask the DataNode what it had configured, because it's
              // dead.
              DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
                      .getAddress().getHostAddress(), addr.getHostName(), "",
                      addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
                      defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
              setDatanodeDead(dn);
              nodes.add(dn);
            }
          }
      

      The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart.

      Attachments

        1. HDFS-8056-2.patch
          4 kB
          Ming Ma
        2. HDFS-8056.patch
          3 kB
          Ming Ma

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mingma Ming Ma
            mingma Ming Ma
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment