Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2928 YARN Timeline Service v.2: alpha 1
  3. YARN-4711

NM is going down with NPE's due to single thread processing of events by Timeline client

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      After YARN-3367, while testing the latest 2928 branch came across few NPEs due to which NM is shutting down.

      2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
              at java.lang.Thread.run(Thread.java:745)
      
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
              at org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
              at java.lang.Thread.run(Thread.java:745)
      

      On analysis found that the there was delay in processing of events, as after YARN-3367 all the events were getting processed by a single thread inside the timeline client.

      Additionally found one scenario where there is possibility of NPE:

      • TimelineEntity.toString() when real is not null

        Attachments

        1. 4711Analysis.txt
          7 kB
          Naganarasimha G R
        2. YARN-4711-YARN-2928.v1.001.patch
          42 kB
          Naganarasimha G R
        3. YARN-4711-YARN-2928.v1.002.patch
          42 kB
          Naganarasimha G R

          Activity

            People

            • Assignee:
              Naganarasimha Naganarasimha G R
              Reporter:
              Naganarasimha Naganarasimha G R

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment