Description
If HA is configured for the Resource Manager in a secure environment, using the mapred client goes into a loop if the user is not authenticated with Kerberos.
[root@pb6sec-1 ~]# mapred job -list 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 failover attempts. Trying to failover after sleeping for 582ms. 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 failover attempts. Trying to failover after sleeping for 1667ms. 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:49 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 6 failover attempts. Trying to failover after sleeping for 1055ms. 17/10/25 06:37:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:50 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:50 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 7 failover attempts. Trying to failover after sleeping for 2608ms. ...
The reason is that the retry handler sees a ConnectException, then fails over to the inactive RM. This obviously doesn't work, so it comes back to the active and whole process starts again. The RetryHandler should examine if the ConnectException is actually caused by a GSSException (and probably check the "No valid credentials provided" message) and if so, it should not perform a failover.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-16580 Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
- Resolved