Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
I have backport Multi-standby NNs to our own hdfs version. I found an issue of EditLog roll.
Reproducible Steps:
1.original state
nn1 active
nn2 standby
nn3 standby
2. stop nn1
3. new state
nn1 stopped
nn2 active
nn3 standby
4. nn3 unable to trigger a roll of the active NN
[2018-08-22T10:33:38.025+08:00] [WARN] namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java 307) [Edit log tailer] : Unable to trigger a roll of the active NN
java.net.ConnectException: Call From <nn3 hostname> to <nn1 hostname> failed on connection exception: java.net.ConnectException: Connection refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor17.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:722)
at org.apache.hadoop.ipc.Client.call(Client.java:1536)
at org.apache.hadoop.ipc.Client.call(Client.java:1463)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:237)
at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:301)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:298)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:414)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:304)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$800(EditLogTailer.java:69)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:346)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:315)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:332)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:328)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:521)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:485)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:658)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:756)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:419)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1585)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
... 14 more
Attachments
Issue Links
- is duplicated by
-
HADOOP-15684 triggerActiveLogRoll stuck on dead name node, when ConnectTimeoutException happens.
- Resolved
- is related to
-
HDFS-6440 Support more than 2 NameNodes
- Resolved