Details
-
Bug
-
Status: Patch Available
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Job submit failure and Task executes failure if Standby NameNode domain name can not resolved on HDFS HA with DelegationToken feature.
This issue is triggered when create ConfiguredFailoverProxyProvider instance which invoke HAUtil.cloneDelegationTokenForLogicalUri in HA mode with Security. Since in HDFS HA mode UGI need include separate token for each NameNode in order to dealing with Active-Standby switch, the double tokens' content is same of course.
However when #setTokenService in HAUtil.cloneDelegationTokenForLogicalUri it checks whether the address of NameNode has been resolved or not, if Not, throw #IllegalArgumentException upon, then job submitter/ task executor fail.
HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets resolve completely.
Another questions many guys consider is why NameNode domain name can not resolve? I think there are many scenarios, for instance node replace when meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure should not impact Hadoop cluster stability in my opinion.
a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
public static Text buildTokenService(InetSocketAddress addr) { String host = null; if (useIpForTokenService) { if (addr.isUnresolved()) { // host has no ip address throw new IllegalArgumentException( new UnknownHostException(addr.getHostName()) ); } host = addr.getAddress().getHostAddress(); } else { host = StringUtils.toLowerCase(addr.getHostName()); } return new Text(host + ":" + addr.getPort()); }
b.exception log ref:
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:761) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:691) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.<init>(ChRootedFileSystem.java:106) at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178) at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172) at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303) at org.apache.hadoop.fs.viewfs.InodeTree.<init>(InodeTree.java:377) at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.<init>(ViewFileSystem.java:172) at org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665) ... 35 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498) ... 58 more Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: standbynamenode at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:390) at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:369) at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:317) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:132) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:84) ... 62 more Caused by: java.net.UnknownHostException: standbynamenode ... 67 more
Attachments
Attachments
Issue Links
- causes
-
HADOOP-15883 Fix WebHdfsFileSystemContract test
- Resolved
- is duplicated by
-
YARN-9297 Renaming RM could cause application to crash
- Resolved