Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6695

Race condition in RM for publishing container events vs appFinished events causes NPE

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: None
    • Labels:
      None

      Description

      When RM publishes container events i.e by enabling yarn.rm.system-metrics-publisher.emit-container-events, there is race condition for processing events
      vs appFinished event that removes appId from collector list which cause NPE.

      Look at the below trace where appId is removed from collectors first and then corresponding events are processed.

      2017-06-06 19:28:48,896 INFO  capacity.ParentQueue (ParentQueue.java:removeApplication(472)) - Application removed - appId: application_1496758895643_0005 user: root leaf-queue of parent: root #applications: 0
      2017-06-06 19:28:48,921 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(190)) - The collector service for application_1496758895643_0005 was removed
      2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing entity TimelineEntity[type='YARN_CONTAINER', id='container_e01_1496758895643_0005_01_000002']
      java.lang.NullPointerException
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
      	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
      	at java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. YARN-6695.001.patch
          1 kB
          Eric Yang
        2. YARN-6695-002.patch
          6 kB
          Prabhu Joseph

          Issue Links

            Activity

              People

              • Assignee:
                prabhujoseph Prabhu Joseph
                Reporter:
                rohithsharma Rohith Sharma K S
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: