Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.20.2
-
None
-
None
Description
We experienced several jobs stuck in Reduce on a cluster. All of the stuck reduce tasks had a similar were stuck at "Need another 2 map output(s) where 0 is already in progress" despite all of the mappers having completed, and 0 scheduled. The stuck reducers had experienced the following exception early in the shuffle:
java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
Will attach more information and logs momentarily.