Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8056

Decommissioned dead nodes should continue to be counted as dead after NN restart

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      We had some offline discussion with Andrew Wang and Colin P. McCabe about this. Bring this up for more input and get the patch in place.

      Dead nodes are tracked by DatanodeManager's datanodeMap. However, after NN restarts, those nodes that were dead before NN restart won't be in datanodeMap. DatanodeManager's getDatanodeListForReport will add those dead nodes, but not if they are in the exclude file.

          if (listDeadNodes) {
            for (InetSocketAddress addr : includedNodes) {
              if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
                continue;
              }
              // The remaining nodes are ones that are referenced by the hosts
              // files but that we do not know about, ie that we have never
              // head from. Eg. an entry that is no longer part of the cluster
              // or a bogus entry was given in the hosts files
              //
              // If the host file entry specified the xferPort, we use that.
              // Otherwise, we guess that it is the default xfer port.
              // We can't ask the DataNode what it had configured, because it's
              // dead.
              DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
                      .getAddress().getHostAddress(), addr.getHostName(), "",
                      addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
                      defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
              setDatanodeDead(dn);
              nodes.add(dn);
            }
          }
      

      The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart.

        Attachments

        1. HDFS-8056-2.patch
          4 kB
          Ming Ma
        2. HDFS-8056.patch
          3 kB
          Ming Ma

          Activity

            People

            • Assignee:
              mingma Ming Ma
              Reporter:
              mingma Ming Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: