[YARN-9673] RMStateStore writeLock make app waste more time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Blocker
Resolution: Unresolved
Affects Version/s: 2.7.3
Fix Version/s: None
Component/s: yarn
Labels:
None

Description

We have 1000 nodes in the cluster. Recently I found that when many tasks are submitted to the resourcemanager, an application takes 5-8 minutes from NEW to NEW_SAVING state, and an appattempt takes almost the same time from ALLOCATED_SAVING to ALLOCATED. I think the problem occurs in RMStateStore#handleStoreEvent, both methods will call this method

Anyone has encountered the same problem?

protected void handleStoreEvent(RMStateStoreEvent event) {
this.writeLock.lock();
try {

if (LOG.isDebugEnabled())

{ LOG.debug("Processing event of type " + event.getType()); }

final RMStateStoreState oldState = getRMStateStoreState();

this.stateMachine.doTransition(event.getType(), event);

if (oldState != getRMStateStoreState())

{ LOG.info("RMStateStore state change from " + oldState + " to " + getRMStateStoreState()); }

} catch (InvalidStateTransitonException e)

{ LOG.error("Can't handle this event at current state", e); }

finally

{ this.writeLock.unlock(); }

}

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: chan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Jul/19 11:36

Updated:: 22/Sep/22 03:57