Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9956

Improve connection error message for YARN ApiServerClient

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      In HA environment, yarn.resourcemanager.webapp.address configuration is optional. ApiServiceClient may produce confusing error message like this:

      19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host1.example.com:8090
      19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host2.example.com:8090
      19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms
      19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {}
      GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
      	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
      	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
      	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
      	at java.base/java.security.AccessController.doPrivileged(Native Method)
      	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
      Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
      	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
      	at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
      	at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
      	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
      	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
      	at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
      	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
      	... 15 more
      Caused by: KrbException: Identifier doesn't match expected value (906)
      	at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
      	at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
      	at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
      	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
      	... 21 more
      19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: 
      java.io.IOException: java.lang.reflect.UndeclaredThrowableException
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
      	at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125)
      Caused by: java.lang.reflect.UndeclaredThrowableException
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290)
      	... 6 more
      Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:135)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105)
      	at java.base/java.security.AccessController.doPrivileged(Native Method)
      	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      	... 8 more
      Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
      	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771)
      	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266)
      	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
      	at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125)
      	... 12 more
      Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
      	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
      	at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
      	at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
      	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
      	at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
      	at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
      	at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
      	... 15 more
      Caused by: KrbException: Identifier doesn't match expected value (906)
      	at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
      	at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
      	at java.security.jgss/sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
      	at java.security.jgss/sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
      	... 21 more
      

      When getRMWebAddress fail to connect to either resource manager hosts, it will fall back to use the yarn-default.xml value 0.0.0.0, and attempt to acquire TGS for HTTP/0.0.0.0, which produces the error shown here. It would be better to avoid trying to use yarn.resourcemanager.webapp.address as fallback for RM host lookup in HA enabled cluster.

      In this particular cluster, contacting to host1.example.com and host2.example.com failed due to the same reason that self signed server certificate does not have a valid self-signed CA certificate to verify. This caused the failure in the first place. It would be nice if the error message is more verbose to identify the first error than producing error on the fallback logic which makes no sense to user.

        Attachments

        1. YARN-9956-001.patch
          6 kB
          Prabhu Joseph
        2. YARN-9956-002.patch
          8 kB
          Prabhu Joseph
        3. YARN-9956-003.patch
          8 kB
          Prabhu Joseph
        4. YARN-9956-004.patch
          10 kB
          Prabhu Joseph
        5. YARN-9956-005.patch
          10 kB
          Prabhu Joseph

          Issue Links

            Activity

              People

              • Assignee:
                prabhujoseph Prabhu Joseph
                Reporter:
                eyang Eric Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: