Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.8.0
-
None
-
None
-
[hadoop@namenode01 ~]$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[hadoop@namenode01 ~]$ uname -a
Linux namenode01 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[hadoop@namenode01 ~]$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)[hadoop@namenode01 ~] $ cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) [hadoop@namenode01 ~] $ uname -a Linux namenode01 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [hadoop@namenode01 ~] $ java -version java version "1.8.0_131" Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Description
During a failover scenario caused by the manual killing on the active NameNode process, having fuser failed in the first instance:
2017-07-13 15:59:36,851 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2017-07-13 15:59:36,851 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2017-07-13 15:59:36,860 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2017-07-13 15:59:36,861 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2017-07-13 15:59:36,871 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2017-07-13 15:59:36,871 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2017-07-13 15:59:36,876 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2017-07-13 15:59:36,876 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2017-07-13 15:59:37,048 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2017-07-13 15:59:37,049 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to namenode02
2017-07-13 15:59:37,049 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 8020
2017-07-13 15:59:37,502 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Indeterminate response from trying to kill service. Verifying whether it is running using nc...
2017-07-13 15:59:37,556 WARN org.apache.hadoop.ha.SshFenceByTcpPort: nc -z namenode02 8020 via ssh: nc: invalid option – 'z'
2017-07-13 15:59:37,556 WARN org.apache.hadoop.ha.SshFenceByTcpPort: nc z namenode02 8020 via ssh: Ncat: Try `-help' or man(1) ncat for more information, usage options and help. QUITTING.
2017-07-13 15:59:37,557 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Verified that the service is down.
2017-07-13 15:59:37,557 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from namenode02 port 22
This was raised with HDFS-11308 previously, closed as a duplicate of HDFS-3618 which does not seem to have been resolved itself (PATCH AVAILABLE).
Also, the use of fuser is mentioned in the documentation (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html) but the use of nc (as fallback?) is not mentioned.