Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
We had some offline discussion with andrew.wang and cmccabe about this. Bring this up for more input and get the patch in place.
Dead nodes are tracked by DatanodeManager's datanodeMap. However, after NN restarts, those nodes that were dead before NN restart won't be in datanodeMap. DatanodeManager's getDatanodeListForReport will add those dead nodes, but not if they are in the exclude file.
if (listDeadNodes) { for (InetSocketAddress addr : includedNodes) { if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) { continue; } // The remaining nodes are ones that are referenced by the hosts // files but that we do not know about, ie that we have never // head from. Eg. an entry that is no longer part of the cluster // or a bogus entry was given in the hosts files // // If the host file entry specified the xferPort, we use that. // Otherwise, we guess that it is the default xfer port. // We can't ask the DataNode what it had configured, because it's // dead. DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr .getAddress().getHostAddress(), addr.getHostName(), "", addr.getPort() == 0 ? defaultXferPort : addr.getPort(), defaultInfoPort, defaultInfoSecurePort, defaultIpcPort)); setDatanodeDead(dn); nodes.add(dn); } }
The issue here is the decommissioned dead node JMX will be different after NN restart. It might be better to make it consistent across NN restart.