[MAPREDUCE-969] NullPointerException during reduce freezes job - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.20.2
Fix Version/s: None
Component/s: jobtracker, task, tasktracker
Labels:
None

Description

We experienced several jobs stuck in Reduce on a cluster. All of the stuck reduce tasks had a similar were stuck at "Need another 2 map output(s) where 0 is already in progress" despite all of the mappers having completed, and 0 scheduled. The stuck reducers had experienced the following exception early in the shuffle:

java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)

Will attach more information and logs momentarily.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

bad_job_events
10/Sep/09 23:57
124 kB
Todd Lipcon
bad_job_jt_logs
10/Sep/09 23:57
399 kB
Todd Lipcon
reduce_task_logs
10/Sep/09 23:57
66 kB
Todd Lipcon

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 10/Sep/09 23:45

Updated:: 02/Nov/09 23:49

Resolved:: 02/Nov/09 23:49