>Why not let it be and change site-specific, job-specific configuration?
In my experience, users don't set this until they've been around the Hadoop block for a while, and even then, this one is easy to miss.
The other reality is that few users only run "one" job. It is much more typical to run a series of jobs as part of a work flow. Doing specific, low-level tuning of every knob for every job is asking too much. For those users that do want to do that, then they'll eventually hit this and tune appropriately. But that doesn't mean we shouldn't ship a 'reasonable' default until they get around to setting it themselves.
>I think Allen's point is that the default 5% may be too low from the utilization perspective.
... and that's exactly my point. Inexperienced users wonder why all their reduce slots are not being utilized to get the max throughput of the grid. They have one big job that has all the reduce slots gone, sometimes for hours at a time, when a smaller job has all of its maps finished and just needs a handful of reduces to go. By setting this to reasonable default, chances are this very common case will disappear out-of-the-box.
While I think it would be great to see this tunable go away, that's not where we are at today. So let's just set this to something reasonable and then look at the bigger problem at some later date. There are bigger fish to fry.