The attached patch [MAPRED-1018-7.patch.txt], makes changes to the memory monitoring, configuration and scheduling sections. The changes over earlier patches are primarily in the nature of presentation.
A brief summary:
- Mapreduce Tutorial: This now describes all the Job specific memory configuration options. The expectation is that users who have questions about how to configure memory requirements for their jobs can get answers here.
- Cluster Setup: This describes memory monitoring, links to the job specific options in the Mapreduce tutorial, and describes in detail how to configure cluster specific memory configuration options. I've removed the duplication of describing the job specific options in both the places, since I felt it will be a problem to maintain changes. This describes memory related aspects from an administrator point of view.
- Capacity Scheduler: This describes memory based scheduling. Instead of spelling the precise algorithm out, I have given a gist of how the scheduler works. The description is more in terms of what the scheduler does, rather than how it does it.
Request a review of the same. Please ensure that all the required content is captured, and I've not missed out anything when reorganizing. Also, please check if the documentation is clear to understand.
One thing I've not included is the documentation on the RSS based monitoring introduced in
MAPREDUCE-1221. I am not yet familiar with that part of the code. Also, this patch itself looks reasonably big. Hence, I would request for those changes to be incorporated as a follow-up, though they should be treated as a blocker for the 0.21 release as well. Thoughts ?