[YARN-9826] Blocked threads at EntityGroupFSTimelineStore#getCachedStore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.7.3
Fix Version/s: None
Component/s: timelineserver
Labels:
None

Description

We have observed this case several times on our production cluster where 100s of TimelineServer threads are blocked at the following synchronized block in EntityGroupFSTimelineStore#getCachedStore when our HDFS NameNode is under high load.

    synchronized (this.cachedLogs) {
      // Note that the content in the cache log storage may be stale.
      cacheItem = this.cachedLogs.get(groupId);
      if (cacheItem == null) {
        LOG.debug("Set up new cache item for id {}", groupId);
        cacheItem = new EntityCacheItem(groupId, getConfig());
        AppLogs appLogs = getAndSetAppLogs(groupId.getApplicationId());
        if (appLogs != null) {
          LOG.debug("Set applogs {} for group id {}", appLogs, groupId);
          cacheItem.setAppLogs(appLogs);
          this.cachedLogs.put(groupId, cacheItem);
        } else {
          LOG.warn("AppLogs for groupId {} is set to null!", groupId);
        }
      }
    }

One thread inside the synchronized block performs multiple fs operations (fs.exists) inside getAndSetAppLogs, which could block other threads when, for instance, the NameNode RPC queue is full.

One possible solution is to move getAndSetAppLogs outside the synchronized block.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Harunobu Daikoku

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Sep/19 09:13

Updated:: 28/Dec/20 09:17