Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.9.1
-
None
-
None
Description
I'm experimenting to use Hadoop 2.9.1 to launch applications with docker containers. Inside the container task, we try to get the hostname of the container using
InetAddress.getLocalHost().getHostName()
This works fine with LXC, however it throws the following exception when I enable docker container using:
YARN_CONTAINER_RUNTIME_TYPE=docker YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
The exception:
java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: ctr-1541488751855-0023-01-000003: Temporary failure in name resolution at java.net.InetAddress.getLocalHost(InetAddress.java:1506) at com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: Temporary failure in name resolution at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
Did some research online, it seems to be related to missing entry in /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing the entry :
pi@pi-aw:~/docker/$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a second container_1541488751855_0028_01_000001 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours blissful_turing pi@pi-aw:~/docker/$ de 71e3e9df8bc6 groups: cannot find name for group ID 1000 groups: cannot find name for group ID 116 groups: cannot find name for group ID 126 To run a command as administrator (user "root"), use "sudo <command>". See "man sudo_root" for details. pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$ cat /etc/hosts 127.0.0.1 localhost 192.168.0.14 pi-aw # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$
If I launch the image without YARN, I saw the entry in /etc/hosts:
pi@61f173f95631:~$ cat /etc/hosts 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.17.0.3 61f173f95631
Here is my container-executor.cfg
1 min.user.id=100 2 yarn.nodemanager.linux-container-executor.group=hadoop 3 [docker] 4 module.enabled=true 5 docker.binary=/usr/bin/docker 6 docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE 7 docker.allowed.networks=bridge,host,none 8 docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/
Since I'm using an older version of Hadoop 2.9.1, let me know if this is something already fixed in later version