Came across this issue while experimenting with Fairness in queue with CapacityScheduler.
Ecountered a situation when FairOrderingPolicy with SizeBasedWeight is enabled on queue in CapacityScheduler, while running GridMix V3 that all queue queue resources are consume AMs
Following are setting:
Cluster Total memory capacity 864GB, Global AMResourcePercent=0.1 Global MaxApplications=10000, minAllocationMb=2048, AM memory=2048, mapMemory=reduceMemory=2048
FairOrderingPolicy with SizeBasedWeight=True
According to this at max only 35 AMs can run at a time simultaneously and total 345 containers can run in queue,
Which was verified While running GridMixV3 (which submits 760 applications) with FairOderingPolicy Only (without SizeBasedWeight)
While when ran same test with FairOderingPolicy with SizeBasedWeight=true, 345 AMs(applications) running and since all queue resources are used by AMs no more containers can run, causing all application to get stuck.
Looks like sizeBasedWeight somehow changes/overrides amResoucePercent.