[YARN-8193] YARN RM hangs abruptly (stops allocating resources) when running successive applications. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.2.0, 3.1.1, 2.10.1
Component/s: yarn
Labels:
None

Target Version/s:

2.9.2
Hadoop Flags:

Reviewed

Description

When running massive queries successively, at some point RM just hangs and stops allocating resources. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor.

There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation).

This problem goes away on restarting ResourceManager. No NM restart is required.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-8193.001.patch
23/Apr/18 22:20
5 kB
Zian Chen
YARN-8193.002.patch
25/Apr/18 20:21
5 kB
Zian Chen
YARN-8193-branch-2.10-001.patch
30/Apr/20 17:32
5 kB
Jonathan Hung
YARN-8193-branch-2.9.0-001.patch
30/Jun/18 05:05
5 kB
Juanjuan Tian
YARN-8193-branch-2-001.patch
06/Jul/18 15:44
5 kB
Jason Darrell Lowe

Issue Links

is duplicated by

YARN-8462 Resource Manager shutdown with FATAL Exception

Resolved

YARN-8471 YARN RM hangs and stops allocating resources when applications successively running

Resolved

Activity

People

Assignee:: Zian Chen

Reporter:: Zian Chen

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 20/Apr/18 21:36

Updated:: 30/Apr/20 19:39

Resolved:: 30/Apr/20 19:17