Description
*Description: *While Restarting NM throughing org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: FINISHED_CONTAINERS_PULLED_BY_AM at NEW"
*Environment: *
Server OS :- UBUNTU
No. of Cluster Node:- 2 RM / 4850 NMs
total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
Steps:
1. Total number of containers running state : ~53000
2. Restart the NM's and check in the log
019-06-24 09:37:35,345 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 32744 submitted by user root 2019-06-24 09:37:35,346 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=255.255.19.245 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1561358926330_32744 QUEUENAME=default 2019-06-24 09:37:35,345 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: FINISHED_CONTAINERS_PULLED_BY_AM at NEW at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:669) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1091) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:221) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:143) at java.lang.Thread.run(Thread.java:748)