The code in RMContainerAllocator is meant to handle this case by ramping up the number reducers as maps finish. However, there seems to be something fishy about the total amount of memory available to the job. Compare
2012-05-24 16:47:25,803 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.3 totalMemLimit:63488 finalMapMemLimit:44442 finalReduceMemLimit:19046 netScheduledMapMem:117760 netScheduledReduceMem:15360
2012-05-24 16:47:07,521 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:30 ScheduledMaps:160 ScheduledReduces:0 AssignedMaps:0 AssignedReduces:0 completedMaps:0 completedReduces:0 containersAllocated:0 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:0 availableResources(headroom):memory: 32768
The first says that there is 63488 MB of memory, the second 32768 MB (these numbers stay the same throughout the job). So what could be happening is that the allocator slowly ramps up the number of reducers until they use up 32768 MB (32 slots at 1024MB apiece) thinking that there is still memory available when there isn't. The code has some confusion between the terms 'available resource', 'headroom', and 'cluster resource' - i.e. it's not clear if available resource is a total, or just what's not in use. RMContainerAllocator.getMemLimit() suggests the latter, while the FifoScheduler has the line application.setHeadroom(clusterResource) which suggests that it's a fixed total.