Details
Description
I tried to run the following command on a Hadoop 2.6.0 cluster with HA support
$ hdfs dfsadmin -refreshNamenodes datanode-host:port
to refresh name nodes on data nodes after I replaced one name node with a new one so that I don't need to restart the data nodes. However, I got the following error:
refreshNamenodes: HA does not currently support adding a new standby to a running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
I checked the 2.6.0 code and the error was thrown by the following code snippet, which led me to this JIRA.
void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException
{ Set<InetSocketAddress> oldAddrs = Sets.newHashSet(); for (BPServiceActor actor : bpServices) { oldAddrs.add(actor.getNNSocketAddress()); }Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
}
Looks like this the refreshNameNodes command is an uncompleted feature.
Unfortunately, the new name node on a replacement is critical for auto provisioning a hadoop cluster with HDFS HA support. Without this support, the HA feature could not really be used. I also observed that the new standby name node on the replacement instance could stuck in safe mode because no data nodes check in with it. Even with a rolling restart, it may take quite some time to restart all data nodes if we have a big cluster, for example, with 4000 data nodes, let alone restarting DN is way too intrusive and it is not a preferable operation in production. It also increases the chance for a double failure because the standby name node is not really ready for a failover in the case that the current active name node fails.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-1623 High Availability Framework for HDFS NN
-
- Closed
-