Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.3.3
-
None
-
None
Description
RM Crashes when changing Node Label of a Node in Distributed Configuration.
2023-01-11 16:25:50,986 ERROR org.apache.hadoop.yarn.event.EventDispatcher (SchedulerEventDispatcher:Event Processor): Error in handling event type NODE_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.removeNode(ClusterNodeTracker.java:194) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.removeNode(CapacityScheduler.java:2145) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1833) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:83) at java.lang.Thread.run(Thread.java:750)
Repro
1. Two NodeManagers with CORE Node Label
yarn.nodemanager.node-labels.provider.configured-node-partition=CORE
yarn.node-labels.enabled = true
yarn.node-labels.configuration-type = distributed
yarn.nodemanager.node-labels.provider = config
2. Remove the Node Label from one of the node to make it Default Partition and restart nodemanager.