Description
In HA environment, yarn.resourcemanager.webapp.address configuration is optional. ApiServiceClient may produce confusing error message like this:
19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host1.example.com:8090 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host2.example.com:8090 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73) at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) ... 15 more Caused by: KrbException: Identifier doesn't match expected value (906) at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) ... 21 more 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) ... 6 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) ... 8 more Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) ... 12 more Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73) at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) ... 15 more Caused by: KrbException: Identifier doesn't match expected value (906) at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) ... 21 more
When getRMWebAddress fail to connect to either resource manager hosts, it will fall back to use the yarn-default.xml value 0.0.0.0, and attempt to acquire TGS for HTTP/0.0.0.0, which produces the error shown here. It would be better to avoid trying to use yarn.resourcemanager.webapp.address as fallback for RM host lookup in HA enabled cluster.
In this particular cluster, contacting to host1.example.com and host2.example.com failed due to the same reason that self signed server certificate does not have a valid self-signed CA certificate to verify. This caused the failure in the first place. It would be nice if the error message is more verbose to identify the first error than producing error on the fallback logic which makes no sense to user.
Attachments
Attachments
Issue Links
- is blocked by
-
YARN-9990 Testcase fails with "Insufficient configured threads: required=16 < max=10"
- Resolved