I forgot I actually do have a combiner.
In the combiner, I call reporter.progress() every 1000 values.
The numbers are a bit high for the number of mappers/reducers I am using, but it was an exploratory job.
Map input records 454,219
Map output records 29,528,547,433
Map output bytes 503,179,031,513
Combine output records map=56,287,259,615 red=13,553,888,779
Reduce input records 15,567,573,707
Reduce output records 2,509,983
Reduce shuffle bytes 337,876,374,027
The first time I tried with 400 mappers and 100 reducers, and I had the timeouts.
The job manages to end with 2000 mappers and 200 reducers.
I tried with a larger input and I had the same timeouts.
The size of the record shouldn't be that large.
The key is always an int pair.
The value is either an int-float pair (most of them, 29,528,011,793) or an array of long-double pairs (535,640 records, for a total size of 649,693,592 bytes). I am using MultipleInputs and GenericWritable to shuffle them together, can this be the culprit?