Description
Saw a Tez AM hang due to an NPE in the DelayedContainerManager:
2016-07-17 01:53:23,157 [ERROR] [DelayedContainerManager] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[DelayedContainerManager,5,main] threw an Exception. java.lang.NullPointerException at org.apache.tez.dag.app.rm.TezAMRMClientAsync.getMatchingRequestsForTopPriority(TezAMRMClientAsync.java:142) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getMatchingRequestWithoutPriority(YarnTaskSchedulerService.java:1474) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$500(YarnTaskSchedulerService.java:84) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$NodeLocalContainerAssigner.assignReUsedContainer(YarnTaskSchedulerService.java:1869) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainerWithLocation(YarnTaskSchedulerService.java:1753) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:733) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:84) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:2030)
After the DelayedContainerManager thread exited the AM proceeded to receive requested containers that would go unused until the container allocations expired. Then they would be re-requested, and the cycle repeated indefinitely.
Attachments
Attachments
Issue Links
- relates to
-
TEZ-3464 Fix findbugs warnings in tez-dag mainLoop
- Closed