Here's my proposed fix:
1) Add a "free space on compute node" field to TaskTrackerStatus. This is the real physical space available, plus the sum of (commitment - reservation) for each running map task.
2) Add a "space used by this task" and "space reserved for task" to TaskStatus as well.
3) Add a "space to reserve" to either Task or MapTask. This is computed by the JobTracker, and used by the TaskTracker
4) Create a new ResourceConsumptionEstimator class, and have an instance of that type for each JobInProgress. This will have, at a minimum, reportCompletedMapTask(MapTaskStatus t) and estimateSpaceForMapTask(MapTask mt) The implementation would probably be a thread that processes asynchronously, and updates an atomic value that'll be either the estimated space requirement, or else the estimated ratio between input size and output size. Until sufficiently many maps have completed (10%, say) the size estimate would just be the size of each map's input. Afterwards, we'll take the 75th percentile of the measured blowup in task size.
5) Modify obtainNewMapTask to return null if the space available on the given task tracker is less than the estimate of available space.
6) To avoid deadlocks if there are multiple jobs contending for space, abort the job if too many trackers are rejected as having insufficient space.