Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
2.7.3
-
None
-
None
Description
We have 1000 nodes in the cluster. Recently I found that when many tasks are submitted to the resourcemanager, an application takes 5-8 minutes from NEW to NEW_SAVING state, and an appattempt takes almost the same time from ALLOCATED_SAVING to ALLOCATED. I think the problem occurs in RMStateStore#handleStoreEvent, both methods will call this method
Anyone has encountered the same problem?
protected void handleStoreEvent(RMStateStoreEvent event) {
this.writeLock.lock();
try {
if (LOG.isDebugEnabled())
{ LOG.debug("Processing event of type " + event.getType()); }final RMStateStoreState oldState = getRMStateStoreState();
this.stateMachine.doTransition(event.getType(), event);
if (oldState != getRMStateStoreState())
{ LOG.info("RMStateStore state change from " + oldState + " to " + getRMStateStoreState()); }} catch (InvalidStateTransitonException e)
{ LOG.error("Can't handle this event at current state", e); }finally
{ this.writeLock.unlock(); }}