Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Slider 0.92
-
None
Description
In an an environment where Hadoop Worker nodes bind the Node Manager to an interface with a hostname different from the one returned by socket.getfqdn() for example in our test environment a difference between f-bcpc-vm3 and just bcpc-vm3, which is the hostname bound to the management interface, but not the interface for hadoop/production traffic. This results in our inability to introspect running jobs.
For example running slider registry --name slider_poc --listexp results in the following output in the ResourceManager logs
2018-01-26 17:30:32,147 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is accessing unchecked http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports which is the app master GUI of application_1516910361403_0094 owned by ubuntu
2018-01-26 17:31:13,639 WARN org.mortbay.log: /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: java.net.ConnectException: Connection timed out (Connection timed out)
Note how the redirect is to http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports, where as it should have been to http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports. Renaming the host to f-bcpc-vm3 results in appropriate behavior.
perhaps hostname.py can be instructed to look at one of before registering
yarn.nodemanager.address
yarn.nodemanager.bind-host
yarn.nodemanager.hostname
When called in Register.py
register =
{'responseId': int(id), 'timestamp': timestamp, 'label': self.config.getLabel(), *'publicHostname': hostname.public_hostname(),* 'agentVersion': version, 'actualState': actualState, 'expectedState': expectedState, 'allocatedPorts': allocated_ports, 'logFolders': log_folders, 'tags': tags }