Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7249

Fix CapacityScheduler NPE issue when a container preempted while the node is being removed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.8.1, 2.7.5
    • Fix Version/s: 2.8.2, 2.7.6
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      This issue could happen when 3 conditions satisfied:

      1) A node is removing from scheduler.
      2) A container running on the node is being preempted.
      3) A rare race condition causes scheduler pass a null node to leaf queue.

      Fix of the problem is to add a null node check inside CapacityScheduler.

      Stack trace:

      2017-08-31 02:51:24,748 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(714)) - Error in handling event type KILL_RESERVED_CONTAINER to the scheduler 
      java.lang.NullPointerException 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1308) 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469) 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497) 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.killReservedContainer(CapacityScheduler.java:1505) 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1341) 
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127) 
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:705) 
      

      This is an issue only existed in 2.8.x

        Attachments

          Activity

            People

            • Assignee:
              leftnoteasy Wangda Tan
              Reporter:
              leftnoteasy Wangda Tan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: