It would be nice to have map / reduce job keep aggregated counts for arbitrary events occuring in its tasks – the numer of records processed, the numer of exceptions of a specific type, the number of sentences in passive voice, whatever the jobs finds useful.
This can be implemented by tasks periodically sending <name, value> pairs to the jobtracker (in some implementations such messages are piggy-backed on the heartbeats), so that the job tracker stores all the latests values from each task and aggregates them on a request. It should also make the aggregated values available at the job end. The value for a task would be flushed when the task fails.
#491 and #490 may be related to this one.