Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Seen the following in one of our environment when AM got allocated container but failed to updated in the ZK Where cluster is having network problem for sometime(up and down).
2015-12-07 16:39:38,489 | WARN | IPC Server handler 49 on 26003 | IPC Server handler 49 on 26003, call org.apache.hadoop.yarn.server.api.ResourceTrackerPB.registerNodeManager from 9.91.8.220:52169 Call#17 Retry#0 | Server.java:2107 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.handleNMContainerStatus(ResourceTrackerService.java:286) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:395) at org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54) at org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$2.callBlockingMethod(ResourceTracker.java:79) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
Corresponding code, it might not match with branch-2.7/Trunk since we had modified internally.
284 RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId); 285 Container masterContainer = rmAppAttempt.getMasterContainer(); 286 if (masterContainer.getId().equals(containerStatus.getContainerId()) 287 && containerStatus.getContainerState() == ContainerState.COMPLETE) { 288 ContainerStatus status = 289 ContainerStatus.newInstance(containerStatus.getContainerId(), 290 containerStatus.getContainerState(), containerStatus.getDiagnostics(), 291 containerStatus.getContainerExitStatus()); 292 // sending master container finished event. 293 RMAppAttemptContainerFinishedEvent evt = 294 new RMAppAttemptContainerFinishedEvent(appAttemptId, status, 295 nodeId); 296 rmContext.getDispatcher().getEventHandler().handle(evt); 297 }