Attaching a patch that implements this improvement. This patch includes a test case which launches 6 mappers concurrently; these mappers run on a variety of schedules (some are faster, some are slower) in an attempt to suss out any race conditions that might develop.
The level of parallelism is controlled by a new parameter: mapred.local.map.tasks.maximum. This defaults to 1, so that unspecified behavior is as before.
I also tested this by running the 'pi' example from the command line:
bin/hadoop jar hadoop-mapred-examples-0.22.0-SNAPSHOT.jar pi -D mapreduce.jobtracker.address=local -D mapreduce.local.map.tasks.maximum=2 20 5000000
With mapreduce.local.map.tasks.maximum set to 1, this takes 13.5 seconds on my machine. With it set to 2 or above (I have two cores), the runtime drops to 8.5 seconds.