Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.9.0, 3.0.0
-
None
Description
While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash due to:
2017-10-18 15:02:05,347 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1508361403235_0001_01_000002 Container Transitioned from RUNNING to COMPLETED 2017-10-18 15:02:05,347 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1508361403235_0001 CONTAINERID=container_1508361403235_0001_01_000002 RESOURCE=<memory:1024, vCores:1> 2017-10-18 15:02:05,349 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:371) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:901) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1326) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:371) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1019) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:887) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1104) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:128) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:748) 2017-10-18 15:02:05,360 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..
This leaves the cluster with no RMs!
Attachments
Attachments
Issue Links
- relates to
-
YARN-9552 FairScheduler: NODE_UPDATE can cause NoSuchElementException
- Resolved