|
According to HADOOP-4079
Marking this a blocker for 0.19, based on the decision being reached in HADOOP-4079.
Attaching new patch with following:
-Added log.info statements when default values are applied to a particular queue. The patch submitted sets the default guaranteed capacity to be 100. This doesn't seem to be right. For e.g, this won't let me get go running just by creating two queues; the total capacity flows over 100 and the scheduler fails to start.
Instead, we could knock this configuration off, take the default guaranteed capacity of a queue to be -1. While starting off the scheduler, we iterate over the list of queues. If the cumulative configured guaranteed-capacities go over 100, fail, otherwise distribute the remaining capacity (100 less cumulative guaranteed-capacities) among the queues that are not configured. This way we can kick -tart multiple queues just by creating them (i.e. simply via mapred.queue.names). The other three configuration parameters are useful in the current format. I like the idea described for guaranteed capacity. The point of the default configurations is to give a shortcut to the administrators to define queues fast. The way we are defining defaults for capacities actually does not help us accomplish this purpose. So, I am +1 for the suggestion.
Some quick review comments:
Attaching latest patch with changed incorporated from Vinod's comments:
Attaching the latest patch fixing rounding errors which would be caused if we don't round off the values which we are dividing which might reduce the actual usage of nodes in a very large clusters.
Comments:
capacity-scheduler.xml:
CapacitySchedulerConf:
CapacityTaskScheduler:
Tests:
The check we are doing should exist, for one reason which I see, capacity for queue is configured and user specifies it as negative. In that case user has specified a wrong capacity, user has option of specifying only positive non-zero values. The -1 is passed by the system and it is private and internal. I don't think it is a good idea for us to allow user to define negative capacities.
In my opinion the checks should always be present, but the question is what should be the fail over mechanism, should we pass on default value or should we throw and exception and stop the JT. I think we should do checks log.warn the negative values Log.Warn the errors and continue processing as we now have defaults configured for these values. With respect to computation calculation one place which I can see where rounding errors can possibly happen is as follows : 3 queues with no GC configured. So values for GC would be 33, 33, 33 (this would mean 1% capacity not used). Or case where 75% is to be allocated to 4 queues. You get 19 and 4 * 19 = 76 over allocation With respect to testGCAllocationToQueues: the 25% is allocated to default you have remaining 75% to be allocated to the 4 queues. So 75/4 = 18.75. Eighteen is rounded up to 19. And not all queue can have 19 as capacity, since we are internally balancing and computing the capacity so that it does not exceed more than 100, One of the queues would get 18. Attaching patch incorporating comments. Still not sure about the test cases in the TestCapacitySchedulerConf. I have removed few values and use some negative values to test if the class returns the default values.
Looks mostly right. Couple of points:
Attaching patch incorporating Hemanth's comments.
Some minor changes:
try { testConf = new CapacitySchedulerConf(new Path(testConfFile)); testConf.getInvalidReclaimTimeLimit("default"); fail("Expect invalid reclaim time limit to raise exception"); } catch(IllegalArgumentException e) { assertTrue(true); } Attaching patch with changes mentioned in the review. Modified test cases to catch the actual failure points in the code.
Some LOG.debug statements have crept into CapacityTaskScheduler which seem unrelated to this patch. Can you please remove them, and submit to Hudson ?
Otherwise, patch looks good to me. Introducing default configurable parameters for CapacityScheduler
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12391124/HADOOP-4178-8.patch against trunk revision 700056. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3392/testReport/ This message is automatically generated. I just committed this.
I can't see this as a blocker, so I changed the fixed in to 0.20. Integrated in Hadoop-trunk #620 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/620/
. Make the capacity scheduler's default values configurable. (Sreekanth Ramakrishnan via omalley) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modified the test case in order to write out default configuration which is required by the CapacitySchedulerConf add a new test case to test the default parameter setting.
Removed an empty method from CapacitySchedulerConf and implemented logic for reading defaults first in the CapacitySchedulerConf