Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.7.2
-
None
-
Reviewed
Description
Due to YARN-4325, many stale applications still exists in NM state store and get recovered after NM restart. The app initiation will get failed due to token invalid, but exception is swallowed and aggregator thread is still created for invalid app.
Exception is:
158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService (LogAggregationService.java:run(300)) - Failed to setup application log directory for application_1448 060878692_11842 159 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo und in cache 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) 162 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) 164 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source) 166 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 167 at java.lang.reflect.Method.invoke(Method.java:606) 168 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) 169 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) 171 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) 172 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) 173 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) 174 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 175 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) 176 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) 177 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) 178 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) 179 at java.security.AccessController.doPrivileged(Native Method) 180 at javax.security.auth.Subject.doAs(Subject.java:415) 181 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 182 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) 183 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) 184 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) 185 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) 186 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)