Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6695

Race condition in RM for publishing container events vs appFinished events causes NPE

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1, 3.1.3
    • None
    • None

    Description

      When RM publishes container events i.e by enabling yarn.rm.system-metrics-publisher.emit-container-events, there is race condition for processing events
      vs appFinished event that removes appId from collector list which cause NPE.

      Look at the below trace where appId is removed from collectors first and then corresponding events are processed.

      2017-06-06 19:28:48,896 INFO  capacity.ParentQueue (ParentQueue.java:removeApplication(472)) - Application removed - appId: application_1496758895643_0005 user: root leaf-queue of parent: root #applications: 0
      2017-06-06 19:28:48,921 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(190)) - The collector service for application_1496758895643_0005 was removed
      2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing entity TimelineEntity[type='YARN_CONTAINER', id='container_e01_1496758895643_0005_01_000002']
      java.lang.NullPointerException
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. YARN-6695.001.patch
          1 kB
          Eric Yang
        2. YARN-6695-002.patch
          6 kB
          Prabhu Joseph

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prabhujoseph Prabhu Joseph
            rohithsharma Rohith Sharma K S
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment