Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.3
-
None
-
Reviewed
Description
SshFencyByTcpPort currently assumes that the NN is listening on localhost. Typical setups have the namenode listening just on the hostname of the namenode, which would lead "nc -z" to not catch it.
Here's an example in which the NN is running, listening on 8020, but doesn't respond to "localhost 8020".
[root@xxx ~]# lsof -P -p 5286 | grep -i listen java 5286 root 110u IPv4 1772357 TCP xxx:8020 (LISTEN) java 5286 root 121u IPv4 1772397 TCP xxx:50070 (LISTEN) [root@xxx ~]# nc -z localhost 8020 [root@xxx ~]# nc -z xxx 8020 Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded!
Here's the likely offending code:
LOG.info( "Indeterminate response from trying to kill service. " + "Verifying whether it is running using nc..."); rc = execCommand(session, "nc -z localhost 8020");
Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening on the hostname it's configured as), or just to use fuser. Fuser catches ports independently of what IPs they're bound to:
[root@xxx ~]# fuser 1234/tcp 1234/tcp: 6766 6768 [root@xxx ~]# jobs [1]- Running nc -l localhost 1234 & [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 & [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 nc 6766 root 3u IPv4 2563626 TCP localhost:1234 (LISTEN) nc 6768 root 3u IPv4 2563671 TCP xxx:1234 (LISTEN)