Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2387

Resource Manager crashes with NPE due to lack of synchronization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.5.0, 3.0.0-alpha1
    • 2.6.0
    • None
    • None
    • Reviewed

    Description

      We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace for it.

      2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
      org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
      handling event type NODE_UPDATE to the scheduler
      java.lang.NullPointerException
              at
      org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
              at
      org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
              at
      org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
              at
      org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
              at
      org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
              at java.lang.String.valueOf(String.java:2854)
              at java.lang.StringBuilder.append(StringBuilder.java:128)
              at
      org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
              at java.lang.String.valueOf(String.java:2854)
              at java.lang.StringBuilder.append(StringBuilder.java:128)
              at
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
              at
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
              at
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
              at
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
              at
      org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
              at
      org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
              at java.lang.Thread.run(Thread.java:722)
      2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
      org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
      

      On investigating a on the issue we found that the ContainerStatusPBImpl has methods that are called by different threads and are not synchronized. Even the 2.X code looks alike.

      We need to make these methods synchronized so that we do not encounter this problem in future.

      Attachments

        1. YARN-2387.patch
          1 kB
          Mit Desai
        2. YARN-2387.patch
          3 kB
          Mit Desai
        3. YARN-2387.patch
          3 kB
          Mit Desai

        Activity

          People

            mitdesai Mit Desai
            mitdesai Mit Desai
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: