[HDFS-8693] refreshNamenodes does not support adding a new standby to a running DN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
Component/s: datanode, ha
Labels:
None

Hadoop Flags:

Reviewed

Description

I tried to run the following command on a Hadoop 2.6.0 cluster with HA support

$ hdfs dfsadmin -refreshNamenodes datanode-host:port

to refresh name nodes on data nodes after I replaced one name node with a new one so that I don't need to restart the data nodes. However, I got the following error:

refreshNamenodes: HA does not currently support adding a new standby to a running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.

I checked the 2.6.0 code and the error was thrown by the following code snippet, which led me to this JIRA.

void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException

{ Set<InetSocketAddress> oldAddrs = Sets.newHashSet(); for (BPServiceActor actor : bpServices) { oldAddrs.add(actor.getNNSocketAddress()); }

Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())

{ // Keep things simple for now -- we can implement this at a later date. throw new IOException( "HA does not currently support adding a new standby to a running DN. " + "Please do a rolling restart of DNs to reconfigure the list of NNs."); }

}

Looks like this the refreshNameNodes command is an uncompleted feature.

Unfortunately, the new name node on a replacement is critical for auto provisioning a hadoop cluster with HDFS HA support. Without this support, the HA feature could not really be used. I also observed that the new standby name node on the replacement instance could stuck in safe mode because no data nodes check in with it. Even with a rolling restart, it may take quite some time to restart all data nodes if we have a big cluster, for example, with 4000 data nodes, let alone restarting DN is way too intrusive and it is not a preferable operation in production. It also increases the chance for a double failure because the standby name node is not really ready for a failover in the case that the current active name node fails.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-8693.02.patch
07/Nov/16 12:03
5 kB
Ajith S
HDFS-8693.03.patch
21/Aug/17 05:08
5 kB
Ajith S
HDFS-8693.1.patch
30/Mar/16 07:24
5 kB
Ajith S
HDFS-8693-03-addendum.patch
12/Feb/18 10:23
1 kB
Brahma Reddy Battula
HDFS-8693-03-Addendum-branch-2.patch
12/Feb/18 17:07
3 kB
Brahma Reddy Battula

Issue Links

is related to

HDFS-1623 High Availability Framework for HDFS NN

Closed

Activity

People

Assignee:: Ajith S

Reporter:: Jian Fang

Votes:: 1 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 29/Jun/15 17:32

Updated:: 15/Feb/18 07:12

Resolved:: 12/Feb/18 17:09