Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.6.0
-
None
-
Reviewed
Description
Branch:
2.6.0
Environment:
A 3-node cluster with RM HA enabled. The HA setup went pretty smooth (used Ambari) and then installed HBase using Slider. After some time the RMs went down and would not come back up anymore. Following is the NPE we see in both the RM logs.
2014-09-16 01:36:28,037 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(612)) - Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:530) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:678) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:603) at java.lang.Thread.run(Thread.java:744) 2014-09-16 01:36:28,042 INFO resourcemanager.ResourceManager (ResourceManager.java:run(616)) - Exiting, bbye..
All the logs for this 3-node cluster has been uploaded.
Attachments
Attachments
Issue Links
- duplicates
-
YARN-2822 NPE when RM tries to transfer state from previous attempt on recovery
- Resolved