Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6695

Race condition in RM for publishing container events vs appFinished events causes NPE

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1, 3.1.3
    • None
    • None

    Description

      When RM publishes container events i.e by enabling yarn.rm.system-metrics-publisher.emit-container-events, there is race condition for processing events
      vs appFinished event that removes appId from collector list which cause NPE.

      Look at the below trace where appId is removed from collectors first and then corresponding events are processed.

      2017-06-06 19:28:48,896 INFO  capacity.ParentQueue (ParentQueue.java:removeApplication(472)) - Application removed - appId: application_1496758895643_0005 user: root leaf-queue of parent: root #applications: 0
      2017-06-06 19:28:48,921 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(190)) - The collector service for application_1496758895643_0005 was removed
      2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing entity TimelineEntity[type='YARN_CONTAINER', id='container_e01_1496758895643_0005_01_000002']
      java.lang.NullPointerException
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. YARN-6695.001.patch
          1 kB
          Eric Yang
        2. YARN-6695-002.patch
          6 kB
          Prabhu Joseph

        Issue Links

          Activity

            People

              prabhujoseph Prabhu Joseph
              rohithsharma Rohith Sharma K S
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: