Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1195

Samza AM not updating AMRM token correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: yarn
    • Labels:
      None

      Description

      In SAMZA-727, we added Kerberos support for Samza on a secure Yarn cluster. Recently I have seen a bug.

      After the job starts, all of the contains get killed and restarted fter it runs for 1-2 days. With some initial investigation, I see the failure pattern as follows:

      • Samza Application Master fails with the following messages
        2017-04-07 11:26:01 AMRMClientAsyncImpl [ERROR] Stopping callback due to: 
        org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1491514786235_0002_000002
        	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
        	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:497)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        	at com.sun.proxy.$Proxy16.allocate(Unknown Source)
        	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
        	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
        Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1491514786235_0002_000002
        	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        	at com.sun.proxy.$Proxy15.allocate(Unknown Source)
        	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
        	... 8 more
        
      • AM is stopped because of the exception, with optionally lots of retries on a HA cluster.
      • After that, all Samza containers gets killed for the current AM attempt.
      • A new instance of AM gets created by resource manager, and this repeats for 3 times until the job dies.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                capricornius Chen Song
                Reporter:
                capricornius Chen Song
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: