Description
For hadoop version before YARN-8933. When tez app is running in yarn fed cluster, getAvailableResources may return null, then throw NPE.
2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Got Error from RMClient java.lang.NullPointerException at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428) 2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler Thread,5,main] threw an Exception. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432) Caused by: java.lang.NullPointerException at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218) at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
In yarn federatiaon, AMRMProxy connect multi-rm in async way, so AllocateResponse::getAvailableResources may return null, then throw NPE.
In my PR, I replace Resource.Instance(0,0) to null. Because null may means yarn is busy, return 0 is reasonable.
Attachments
Issue Links
- links to