[YARN-9164] Shutdown NM may cause NPE when opportunistic container scheduling is enabled - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.4, 3.1.2, 3.3.0, 3.2.1
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

We have meeted an NPE which can crash the whole cluster

2018-12-31 22:18:11,924 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:696)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1123)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1827)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:745)

this bug also happens in the latest trunk!

workload is

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-$VERSION.jar pi -Dmapreduce.job.num-opportunistic-maps-percent="100" 50 100

while job is running , shutdown one NM

also need inject sleep before AbstractYarnScheduler.getNode()

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hadoop-hires-resourcemanager-hadoop11.log
31/Dec/18 14:41
71 kB
lujie
YARN-9164-0.patch
01/Jan/19 07:59
1 kB
lujie
YARN-9164-1.patch
02/Jan/19 15:47
12 kB
lujie
YARN-9164-2.patch
03/Jan/19 02:02
12 kB
lujie

Issue Links

duplicates

YARN-9165 NPE which is similar to YARN-5918

Resolved

Activity

People

Assignee:: lujie

Reporter:: lujie

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Dec/18 14:39

Updated:: 03/Jan/19 23:53

Resolved:: 03/Jan/19 16:59