Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9802 YARN Timeline Service v2 (post GA features)
  3. YARN-10240

Prevent Fatal CancelledException in TimelineV2Client when stopping

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.4.0
    • ATSv2
    • None

    Description

      When the timeline client is stopped, it will cancel all sync EntityHolders after waiting for a drain timeout.

      // if some entities were not drained then we need interrupt
                        // the threads which had put sync EntityHolders to the queue.
                        EntitiesHolder nextEntityInTheQueue = null;
                        while ((nextEntityInTheQueue =
                            timelineEntityQueue.poll()) != null) {
                          nextEntityInTheQueue.cancel(true);
                        }
      

      We only handle interrupted exception here.

      if (sync) {
              // In sync call we need to wait till its published and if any error then
              // throw it back
              try {
                entitiesHolder.get();
              } catch (ExecutionException e) {
                throw new YarnException("Failed while publishing entity",
                    e.getCause());
              } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                throw new YarnException("Interrupted while publishing entity", e);
              }
            }
      

      But calling nextEntityInTheQueue.cancel(true) will result in entitiesHolder.get() throwing a CancelledException which is not handled. This can result in FATAL error in NM. We need to prevent this.

      FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in dispatcher thread
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
      	at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:348)
      

      Attachments

        1. YARN-10240.001.patch
          1 kB
          Tarun Parimi

        Activity

          People

            tarunparimi Tarun Parimi
            tarunparimi Tarun Parimi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: