Description
When the timeline client is stopped, it will cancel all sync EntityHolders after waiting for a drain timeout.
// if some entities were not drained then we need interrupt // the threads which had put sync EntityHolders to the queue. EntitiesHolder nextEntityInTheQueue = null; while ((nextEntityInTheQueue = timelineEntityQueue.poll()) != null) { nextEntityInTheQueue.cancel(true); }
We only handle interrupted exception here.
if (sync) { // In sync call we need to wait till its published and if any error then // throw it back try { entitiesHolder.get(); } catch (ExecutionException e) { throw new YarnException("Failed while publishing entity", e.getCause()); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new YarnException("Interrupted while publishing entity", e); } }
But calling nextEntityInTheQueue.cancel(true) will result in entitiesHolder.get() throwing a CancelledException which is not handled. This can result in FATAL error in NM. We need to prevent this.
FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:348)