First, the easy parts
typo for manualy
usedAMResources is not used by sub-class, so suggest to replace it with private
Exception messages here should be more meaningful than "c1", or "c2".
yup - fixed
The log level here should be info or warn level rather than debug level. In most cases, LOG.debug() should be under block of LOG.isDebugEnabled().
So, I had made this debug rather than something higher because I'm not sure we always care, and it doesn't represent a failure case - this is normal/expected case, and other similar cases for not starting the app don't log at all. But, I can see that it will be helpful to know this, and I don't think that it will result in excessive logging - so I went ahead and made it an "info" level, sound good? BTW, the "isXYZenabled" idiom is to save the cost of evaluating the argument construction for the log message as these can be very expensive, but for cheap cases like this (a string literal) it's not necessary as the only cost is going to be the same evaluation for logging which will happen during the call
Now for the more complicated one:
Looks like maxAMResourcePerQueuePercent is a allowed percent for AM resource in each queue. So we may should calculate amLimit per queue rather than aggregate all applications together.
So, yes and no - the current behavior actually takes the maxAM... which is set globally and it apportions it out based on the queue's baseline share of the cluster - so if the maxam was say, 10%, and a given queue had 50% of the cluster, it would have an effective maxampercent value of 5% (it's translated into "how many apps can I have running" based on the minallocation of the cluster rather than actual am usage - which is the problem which prompted the fix - but the important thing to get here is the way the overall maxampercent is apportioned out to the queues) There is also the option to override on a per queue basis, so that, in the above scenario, if you didn't like the queue getting the 5% based on the overall process, but you were happy with how other queues were working using the config, you could just override for the given queue.
When I tried to translate this into something which was actually paying attention to the real usage of the ams, two approaches seemed reasonable:
1. Just have a global used am resource value, use the global am percent everywhere (not apportioned) - this way the total cluster level effect is what we want - in this case, the subdivision of the amresource percent value is replaced with a total summing of the used resource amongst the queues. You can still override for a given queue if you want "this queue to be able to go higher", which has the effective result of allowing one queue to go higher than the others, this could starve other queues (bad) but that was already possible with the other approach, albeit in a different way (when the cluster came to be filled with AM's from one particular queue.).
2. We could subdivide the global maxampercent based on the queue share of the baseline (as before) and then have a per-queue amresource percent (and amused) which are evaluated - this would not be a difficult change from the current approach, but I think it is problematic for the reason below
The main reason I took approach number one over two is that I was concerned that with a complex queue structure where there was a reasonable level of subdivision in a smallish cluster you could end up with a queue which can effectively never start anything because the final value is too small to ever be able to start one of the larger AM's we have these days. By sharing it globally this is less likely to happen because that "unused am resource" allocated out to other queues which have a larger share of the cluster is not potentially sitting idle while "leaf queue a.b.c" has a derived maxampercent of say 2%, which translates into 512mb, and so can never start an application master which needs 1G (even though, globally, there's more than enough ampercent to do so). It's the "this queue can never start an am over x size" that concerns me. There are other possible ways to handle this with option 2, but I'm concerned that they would add complexity to the behavior and change the behavior more than is needed to correct the defect.
Junping Du Make sense? Thoughts? I may take a go at option 2 so we can evaluate it, but I'm concerned about the small cluster/too much subdivision scenario being problematic.