Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0-beta
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      App recovery doesn't work as expected using FileSystemRMStateStore.

      Steps to reproduce:

      • Ran sleep job with a single map and sleep time of 2 mins
      • Restarted RM while the map task is still running
      • The first attempt fails with the following error
        Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Password not found for ApplicationAttempt appattempt_1376294441253_0001_000001
        	at org.apache.hadoop.ipc.Client.call(Client.java:1404)
        	at org.apache.hadoop.ipc.Client.call(Client.java:1357)
        	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        	at $Proxy28.finishApplicationMaster(Unknown Source)
        	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
        
      • The second attempt fails with a different error:
        Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hadoop-yarn/staging/kasha/.staging/job_1376294441253_0001/job_1376294441253_0001_2.jhist: File does not exist. Holder DFSClient_NONMAPREDUCE_389533538_1 does not have any open files.
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2543)
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2454)
        	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:534)
        	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
        	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48073)
        	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
        

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kasha Karthik Kambatla
                Reporter:
                kasha Karthik Kambatla
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: