[MAPREDUCE-7028] Concurrent task progress updates causing NPE in Application Master - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
Fix Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
Component/s: mr-am
Labels:
None

Target Version/s:

3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6

Description

Concurrent task progress updates can cause a NullPointerException in the Application Master (stack trace is with code at current trunk):

2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..

This happened naturally in several big wordcount runs, and I could reproduce this reliably by artificially making task updates more frequent.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-7028.000.patch
20/Dec/17 16:01
4 kB
Gergo Repas
MAPREDUCE-7028.001.patch
21/Dec/17 14:36
3 kB
Gergo Repas
MAPREDUCE-7028.002.patch
02/Jan/18 17:39
3 kB
Gergo Repas
MAPREDUCE-7028.003.patch
02/Jan/18 18:29
3 kB
Gergo Repas
MAPREDUCE-7028.004.patch
02/Jan/18 19:12
3 kB
Gergo Repas

Issue Links

is broken by

MAPREDUCE-5124 AM lacks flow control for task events

Resolved

Activity

People

Assignee:: Gergo Repas

Reporter:: Gergo Repas

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Dec/17 15:33

Updated:: 26/Feb/20 05:29

Resolved:: 03/Jan/18 17:23